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PATENT 
Customer No. 32425 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

In re Application of: 

Peter DROGE, Nicole CHRIST and Group Art Unit: 1636 

Elke LORBACH 

Examiner: Q. Nguyen 

Serial No.: 10/082,772 

Atty. Dkt. No.: DEBE:008US/SLH 

Filed: February 22, 2002 

For: SEQUENCE-SPECIFIC DNA RECOMBI- Confirmation No.: 4391 
NATION IN EUKARYOTIC CELLS 



REPLY BRIEF 



Commissioner for Patents 
P.O. Box 1450 

Alexandria, VA 22313-01450 
Dear Sir: 

This Reply Brief is filed in response to the Examiner's Answer mailed on August 5, 
2008. Appellant's brief is due October 6, 2008, October 5 th being a Sunday. No fees are 
believed due in connection with this filing; however, should any fees be due, appellants authorize 
the Commissioner to debit Fulbright & Jaworski L.L.P. Deposit Account No. 50- 
1212/DEBE:008US. 



I. Real Party In Interest 

The real party in interest is the assignee, Peter Droge. 



II. Related Appeals and Interferences 

There are no related appeals or interferences. 
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III. Status of the Claims 

Claims 1-28 were filed with the original application, but these claims were canceled in a 
preliminary amendment in favor of new claims 29-60. Claims 29-51 and 58 were elected in a 
response to restriction requirement, and claims 31, 40-42, 52-57, 59 and 60 were canceled during 
prosecution. Thus, claims 29, 30, 32-39, 43-51 and 58 are pending in the application, stand 
rejected and are appealed. A copy of the appealed claims is attached as Appendix A. 

IV. Status of the Amendments 

No unentered amendments have been offered. 

V. Summary of the Claimed Subject Matter 

Independent claim 29, drawn to a method of sequence specific recombination of DNA in 
a eukaryotic cell using attB, attP, attR and attL sequences, along with wild-type or int-h or int- 
h/218 integrases, is supported in the application as filed at page 8, line 26, to page 9, line 6, and 
page 14, lines 9-14. 

VI. Ground of Rejection to be Reviewed on Appeal 

1. Are claims 29, 30, 32, 33, 36, 38, 44-48 and 58 obvious under 35 U.S.C. §103 
over the combined disclosures of Crouzet et al. (Exhibit 1) and Christ & Droge 
(Exhibit 2). 

2. Are claims 29 and 43 obvious under 35 U.S.C. §103 over Crouzet et al, Christ & 
Droge and Capecchi et al. (Exhibit 3). 

3. Are claims 29, 34-37 and 39 obvious under 35 U.S.C. §103 over Crouzet et al, 
Christ & Droge and Hartley et al. (Exhibit 4). 

4. Are claims 29 and 49-51 obvious under 35 U.S.C. §103 over Crouzet et al, Christ 
& Droge, Hartley et al. and Calos et al. (Exhibit 7). 
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VII. Argument 

A. Standard of Review 

Findings of fact and conclusions of law by the U.S. Patent and Trademark Office must be 
made in accordance with the Administrative Procedure Act, 5 U.S.C. §706(A), (E), 1994. 
Dickinson v. Zurko, 527 U.S. 150, 158 (1999). Moreover, the Federal Circuit has held that 
findings of fact by the Board of Patent Appeals and Interferences must be supported by 
"substantial evidence" within the record. In re Gartside, 203 F.3d 1305, 1315 (Fed. Cir. 2000). 
In In re Gartside, the Federal Circuit stated that "the 'substantial evidence' standard asks 
whether a reasonable fact finder could have arrived at the agency's decision." Id. at 1312. 
Accordingly, it necessarily follows that an examiner's position on appeal must be supported by 
"substantial evidence" within the record in order to be upheld by the Board of Patent Appeals 
and Interferences. 

B. Rejection of Claims 29 and 49-51 Under 35 U.S. C. §103 

Claims 29 and 49-51 were newly rejected in the office action of February 21, 2008 over 
Crouzet, Calos, Hartley and Christ & Droge. Inadvertantly, appellants did not identify that new 
and distinct rejection when filing their brief on appeal. Appellants submit, however, that this 
was in no way an intention to acquiesce to this rejection. Moreover, given the rejection of claim 
29 in each of the other three grounds for rejection, this was in fact only a unique and new 
rejection of claims 49-51. Indeed, as stated, the rejection merely reapplies three references 
discussed elsewhere in the brief, with the addition of Calos. Yet this latter reference merely is 
cited for a teaching of modified integrases and an excision factor in the context of a second 
sequence specific recombination (see claims 49-51). Regardless, the teachings of Calos cannot 
correct the manifest defects already pointed out with respect to Crouzet, Hartley and Christ & 
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Droge, and as such, the rejection relies on the same flawed premise as other rejections already 
traversed and argued many times over. Thus, for reasons given throughout the prosecution, in 
the previously filed appeal brief, and as set forth above, reversal of this rejection is requested as 
well. 

C. General Discussion of the References 

Crouzet et al. is directed, generally, to the provision of medicinal DNA molecules 
suitable for gene therapeutic use, more precisely, for in vivo gene transfer. For this purpose, 
molecules should lack any non-therapeutic regions, e.g., origin of replication, resistance gene, 
non-relevant genes. Furthermore, these molecules should be prepared in high amounts and 
purity in supercoiled form appropriate for pharmaceutical use. To achieve these goals Crouzet et 
al. provide a method for the preparation and purification of mini-circles which fulfill the above 
mentioned criteria. Thus, the method according to their invention lies in the production of mini- 
circles by excision from a plasmid or from a chromosome by site-specific recombination 
whereby the site-specific recombination may be carried out in vitro in a host cell or in vitro on a 
plasmid preparation. 

For a person of ordinary skill in the art, the most efficient way to produce plasmids in 
high amounts and purity is the production in bacteria. Bacteria can be easily grown to high cell 
densities in a short time. Furthermore, plasmids replicate easily in the bacterial host and can be 
maintained at high copy numbers in each cell. In addition, plasmids of high quality can be 
isolated in high yields by well known and established standard techniques. The ease and 
efficiency of production with high yield holds true for the generation of mini-circles by excision 
from a bacterial chromosome as well. The propagation of plasmids in bacteria and the excision 
of the mini-circles using the bacteriophage lambda are therefore the approaches described in the 
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examples provided by Crouzet et al. But, neither methods nor examples for the in vivo excision 
in eukaryotes is provided, and thus there was no proof given in the reference that lambda 
integrase would actually work in eukaryotes, much less whether one would have to provide 
accessory factors, and whether the yields and quality would be sufficient. 

Moreover, it was completely unknown if supercoiled DNA could even be obtained 
following the teachinds of the reference. At that time, no one had shown functionality of lambda 
integrase in eukaryotes, so Crouzet et al. could not even rely on any published data. In 
eukaryotes, DNA in the nucleus is organized in form of chromatin. Nucleosomes, the primary 
repeating unit of chromatin, package DNA by wrapping it around an octamer of histone proteins. 
Crouzet et al. did not show that, after use of their cited standard DNA purification techniques, 
mini-circles in a supercoiled form would be obtained. Therefore, Crouzet et a/.'s inclusion of 
any and all type of cell host, especially mammalian animal cells, was prophetic and completely 
unsupported from a scientific standpoint. The embodiment Crouzet et al. clearly had in mind was 
the production of mini-circles in bacteria - a notion further supported by the methods and 
descriptions given in the reference. For example, if excision from a plasmid should be 
performed in eukaryotes, the plasmid itself must comprise two origins of replication: one for 
propagation in bacteria to allow for the cloning of the plasmid, and one for replication in 
eukaryotes. However, the preferred or particular plasmid according to Crouzet et a/.'s invention 
comprises only a bacterial origin of replication (col. 5, line 33; col. 6, line 14 and 48). In 
addition, for excision of the DNA molecule from the genome of the host cell, the descriptions 
relate solely to techniques used in prokaryotes (col. 8, lines 52-67, col. 9, lines 1-26). In col. 8, 
line 64, it even states "integrated in the genome of the bacterium." Thus, it cannot be argued 
that Crouzet et al. supports or suggests the use of eukaryotic cells. 



55284472.1 



-5- 



Capecchi et al. merely provides better means to screen and select for successful 
homologous recombination events in eukaryotic cells by using positive-negative selection 
vectors. However, one has to differentiate between homologous and site-specific recombination. 
The former does not take place at specific site and relies on sequence similarities (or identity) of 
a piece of DNA introduced into a host cell and the DNA strand exchange can occur anywhere in 
the homologous regions. Also, it involves endogenous enzymes. On the other hand, site-specific 
recombination takes place at a specific site and makes use of enzymes (recombinase) which 
catalyse DNA strand exchange even between molecules that have only limited sequence 
homology. Capecchi et al. neither teach the use of site-specific recombination in eukaryotes, nor 
a combined approach of homologous recombination and site-specific recombination. Thus, the 
deficiencies set forth above with respect to Crouzet et al. remain. 

Hartley et al. provides for recombination cloning of plasmids in vitro, and either the in 
vitro or in vivo selection of recombinatorial cloning products, thereby replacing more tedious, 
conventional cloning procedures which include digestion with restriction enzymes followed by 
ligation reactions. Hartley et al. teach the use of lambda integrase for site-specific recombination, 
but the actual recombinations were performed in vitro with purified wild-type lambda 
recombinase and purified accessory factors XIS and IHF. Also, Hartley et al. teach that selection 
for the actual recombinatorial cloning product (which is one of the product in the in vitro 
recombination mixture; others are by-product, co-integrate, insert and vector donor) can be 
performed either in vitro by using for example rare restriction enzyme site (col. 13, lines 12-26), 
or in vivo in host cells by using suitable selection markers (col. 9, lines 8-67, col. 10, lines 1-31). 
However, while hosts are defined as any prokaryotic or eukaryotic organism that can be a 
recipient of the recombinatorial cloning product (col. 8, lines 12-13), the actual recombination 
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reaction is performed in vitro. Therefore, Hartley et al. do not teach site-specific recombinations 
in eukaryotic cells, nor do they teach means how to do so, nor do they even claim the execution 
of site-specific recombinations in eukaryotic cells for recombinatorial cloning. 

Christ & Droge teaches assays performed only in bacteria. Christ & Droge do not teach 
site-specific recombinations in eukaryotic cells, nor do they provide means how to do so. In 
contrast to bacteria in which genomic or plasmid DNA is negatively supercoiled, DNA in the 
nucleus of eukaryotes is organized in form of chromatin. Nucleosomes, the primary repeating 
unit of chromatin package DNA by wrapping it around an octamer of histone proteins. 
Therefore, the DNA topology is quite different, and it was not obvious to an ordinary skilled 
artisan to deduce from the existing data that the mutant recombinases would work inside the 
nucleus with completely different DNA topology on an inter- or intramolecular level. It also was 
not clear that the mutant recombinases would even be transported into the nucleus, or that they 
would have any (or sufficient) biological activity or stability once in the nucleus, because the 
physiologicial conditions in a nucleus are different to the physiology in a bacterium or an in vitro 
system. 
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D. Conclusion 

In light of the foregoing, appellant again respectfully submits that all pending claims are 
non-obvious under 35 U.S. C. §103. Therefore, it is respectfully requested that the Board reverse 
each of the pending rejections. 

Respectfully submitted, 



Date: October 6, 2008 



Fulbright & Jaworski L.L.P. 
600 Congress Ave., Suite 2400 
Austin TX 78701 
512-474-5201 



S even (L. Highlander 
Rfeg. tp. 37,642 
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VIII. APPENDIX A - APPEALED CLAIMS 



29. A method of sequence specific recombination of DNA in a eukaryotic cell, comprising: 

(a) providing said eukaryotic cell, said cell comprising a first DNA segment 
integrated into the genome of said cell, said first DNA segment comprising an 
attB sequence according to SEQ ID NO:l or a derivative thereof, an att? 
sequence according to SEQ ID NO:2 or a derivative thereof, an attL sequence 
according to SEQ ID NO: 3 or a derivative thereof, or an attR sequence according 
to SEQ ID NO:4 or a derivative thereof; 

(b) introducing a second DNA segment into said cell, wherein if said first DNA 
segment comprises an attB sequence according to SEQ ID NO:l or a derivative 
thereof, said second DNA segment comprises an att? sequence according to SEQ 
ID NO:2 or a derivative thereof, wherein if said first DNA segment comprises an 
att? sequence according to SEQ ID NO:2 or a derivative thereof, said second 
DNA segment comprises an attB sequence according to SEQ ID NO:l or a 
derivative thereof, wherein if said first DNA segment comprises an attL sequence 
according to SEQ ID NO: 3 or a derivative thereof said second DNA segment 
comprises an attR sequence according to SEQ ID NO: 4 or a derivative thereof, or 
wherein if said first DNA segment comprises an attR sequence according to SEQ 
ID NO:4 or a derivative thereof said second DNA segment comprises an attL 
sequence according to SEQ ID NO:3 or a derivative thereof; and 

(c) further comprising providing to said cell a modified bacteriophage lambda 
integrase Int, wherein said modified Int is Int-h or Int-h/218, which induces 
sequence specific recombination through said attB and att? or attR and attL 
sequences. 

30. The method of claim 29, wherein said first DNA segment was introduced into the 
genome of said cell by recombinant methods. 

32. The method of claim 29, wherein said first DNA segment comprises an attB sequence 



55284472.1 



-9- 



according to SEQ ID NO:l or a derivative thereof, and said second DNA comprises an 
att? sequence according to SEQ ID NO:2 or a derivative thereof. 

33. The method of claim 29, wherein said first DNA segment comprises an att? sequence 
according to SEQ ID NO:2 or a derivative thereof, and said second DNA comprises an 
attB sequence according to SEQ ID NO:l or a derivative thereof. 

34. The method of claim 29, wherein said first DNA segment comprises an attL sequence 
according to SEQ ID NO: 3 or a derivative thereof, and said second DNA sequence 
comprises an attR sequence according to SEQ ID NO:4 or a derivative thereof, further 
comprising, in step (d), providing to said cell a Xis factor. 

35. The method of claim 29, wherein said first DNA segment comprises an attR sequence 
according to SEQ ID NO:4 or a derivative thereof, and said second DNA sequence 
comprises an attL sequence according to SEQ ID NO: 3 or a derivative thereof, further 
comprising, in step (d), providing to said cell a Xis factor. 

36. The method of claim 29, further comprising providing to said cell a third DNA segment 
comprising an Int gene. 

37. The method of claim 36, further comprising providing to said cell a fourth DNA segment 
comprising a Xis factor gene, respectively. 

38. The method of claim 36, wherein said third DNA segment further comprises a regulatory 
sequence effecting a spatial and/or temporal expression of the Int gene. 

39. The method of claim 37, wherein said fourth DNA segment further comprises a 
regulatory sequence effecting a spatial and/or temporal expression of the Xis factor gene. 

43. The method according to claim 29, wherein said first and/or second DNA segment further 
comprise a sequence effecting integration of said first and/or second DNA segment into 
the genome of said cell by homologous recombination. 

44. The method of claim 29, wherein said first and/or second DNA segment further 
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comprises a sequence coding for a polypeptide of interest. 

45. The method of claim 44, wherein said polypeptide of interest is a structural protein, an 
endogenous or exogenous enzyme, a regulatory protein or a marker protein. 

46. The method of claim 29, wherein said first and second DNA segment are introduced into 
the eukaryotic cell on the same DNA molecule. 

47. The method of claim 29, wherein said eukaryotic cell is a mammalian cell. 

48. The method of claim 47, wherein said mammalian cell is a human, simian, mouse, rat, 
rabbit, hamster, goat, bovine, sheep or pig cell. 

49. The method of claim 29, further comprising: 

(d) performing a second sequence specific recombination of DNA by Int-h or Int- 
h/218 and a Xis factor after the steps (a)-(c), wherein said first DNA sequence 
comprises said attB sequence according to SEQ ID NOT or a derivative thereof 
and said second DNA sequence comprises the att? sequence according to SEQ ID 
NO:2 or a derivative thereof, or wherein said first DNA sequence comprises said 
att? sequence according to SEQ ID NO:2 or a derivative thereof and said second 
DNA sequence comprises the attB sequence according to SEQ ID NO:l or a 
derivative thereof . 

50. The method of claim 49, further introducing a further DNA sequence into said cells, the 
further DNA sequence comprising a Xis factor gene. 

51. The method of claim 50, wherein said further DNA sequence comprises further a 
regulatory DNA sequence effecting a spatial and/or temporal expression of said Xis 
factor gene. 

58. An isolated eukaryotic cell obtainable according to the method of claim 29. 
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IX. APPENDIX B - EVIDENCE CITED 

Exhibit 1 - Crouzet et al. 

Exhibit 2 - Christ & Droge et al. 

Exhibit 3 - Capecchi et al. 

Exhibit 4 - Hartley et al. 

Exhibit 5 - Lange-Gustafson et al. 

Exhibit 6 - Declaration of Peter Droge 

Exhibit 7 - Calos et al. 
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X. APPENDIX C - RELATED PROCEEDINGS 

None 
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1 

CIRCULAR DNA EXPRESSION CASSETTES 
FOR IN VIVO GENE TRANSFER 

Gene therapy consists in correcting a deficiency or an 
abnormality by introducing genetic information into the 
affected cell or organ. This information may be introduced 
either in vitro into a cell extracted from the organ and then 
reinjected into the body, or in vivo, directly into the tissue 
concerned. Being a high molecular weight, negatively 
charged molecule, DNA has difficulties in passing sponta- 
neously through the phospholipid cell membranes. Different 
vectors are hence used in order to permit gene transfer: viral 
vectors on the one hand, natural or synthetic, chemical 
and/or biochemical vectors on the other hand. Viral vectors 
(retroviruses, adenoviruses, adeno-associated viruses, etc.) 
are very effective, in particular in passing through 
membranes, but present a number of risks, such as 
pathogenicity, recombination, replication, immunogenicity, 
etc. Chemical and/or biochemical vectors enable these risks 
to be avoided (for reviews, see Behr, 1993, Cotten and 
Wagner, 1993). These vectors are, for example, cations 
(calcium-phosphate, DEAE-dextran, etc.) which act by 
forming precipitates with DNA, which precipitates can be 
"phagocytosed" by the cells. They can also be liposomes in 
which DNA is incorporated and which fuse with the plasma 
membrane. Synthetic gene transfer vectors are generally 
lipids or cationic polymers which complex DNA and form a 
particle therewith carrying positive surface charges. These 
particles are capable of interacting with the negative charges 
of the cell membrane and then of crossing the latter. Dio- 
ctadecylamidoglycylspermine (DOGS, Transfectam™) or 
N-[l-(2,3-dioleyloxy)propyl]-N,N,N-trimethylammonium 
chloride (DOTMA, Lipofectin™) may be mentioned as 
examples of such vectors. Chimeric proteins have also been 
developed: they consist of a polycationic portion which 
condenses DNA, linked to a ligand which binds to a mem- 
brane receptor and carries the complex into the cells by 
endocytosis. It is thus theoretically possible to "target" a 
tissue or certain cell populations so as to improve the in vivo 
bioavailability of the transferred gene. 

However, the use of chemical and/or biochemical vectors 
or of naked DNA implies the possibility of producing large 
amounts of DNA of pharmacological purity. In effect, in 
these gene therapy techniques, the medicinal product con- 
sists of the DNA itself, and it is essential to be able to 
manufacture, in appropriate amounts, DNAs having suitable 
properties for therapeutic use in man. 

The plasmids currently used in gene therapy carry (i) an 
origin of replication, (ii) a marker gene such as a gene for 
resistance to an antibiotic (kanamycin, ampicillin, etc.) and 
(iii) one or more transgenes with sequences required for 
their expression (enhancer(s), promoter(s), polyadenylation 
sequences, etc.). These plasmids currently used in gene 
therapy (in clinical trials such as the treatment of 
melanomas, Nabel et al., 1992, or in experimental studies) 
display, however, some drawbacks associated, in particular, 
with their dissemination in the body. Thus, as a result of this 
dissemination, a competent bacterium present in the body 
can, at a low frequency, receive this plasmid. The chance of 
this occurring is all the greater for the fact that the treatment 
in question entails in vivo gene therapy in which the DNA 
may be disseminated in the patient's body and may come 
into contact with bacteria which infect this patient or alter- 
natively with bacteria of the commensal flora. If the bacte- 
rium which is a recipient of the plasmid is an enterobacte- 
rium such as E. coli, this plasmid may replicate. Such an 
event then leads to the dissemination of the therapeutic gene. 
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Inasmuch as the therapeutic genes used in gene therapy 
treatments can code, for example, for a lymphokine, a 
growth factor, an anti-oncogene, or a protein whose function 
is lacking in the host and hence enables a genetic defect to 

5 be corrected, the dissemination of some of these genes could 
have unforeseeable and worrying effects (for example if a 
pathogenic bacterium were to acquire the gene for a human 
growth factor). Furthermore, the plasmids used in non-viral 
gene therapy also possess a marker for resistance to an 

10 antibiotic (ampicillin, kanamycin, etc.). Hence the bacte- 
rium acquiring such a plasmid has an undeniable selective 
advantage, since any therapeutic antibiotic treatment using 
an antibiotic of the same family as the one selecting the 
resistance gene of the plasmid will lead to the selection of 

15 the plasmid in question. In this connection, ampicillin 
belongs to the |B-lactams, which is the family of antibiotics 
most widely used in the world. It is hence necessary to seek 
to limit as far as possible the dissemination of the therapeutic 
genes and the resistance genes. Moreover, the genes carried 

20 by the plasmid, corresponding to the vector portion of the 
plasmid (function(s) required for replication, resistance 
gene), also run the risk of being expressed in the transfected 
cells. There is, in effect, a transcription background, which 
cannot be ruled out, due to the host's expression signals on 

25 the plasmid. This expression of exogenous proteins may be 
thoroughly detrimental in a number of gene therapy 
treatments, as a result of their potential immunogenicity and 
hence of the attack of the transfected cells by the immune 
system. 

30 Hence it is especially important to be able to have at 
one's disposal medicinal DNA molecules having a genetic 
purity suitable for therapeutic use. It is also especially 
important to have at one's disposal methods enabling these 
DNA molecules to be prepared in amounts appropriate for 

35 pharmaceutical use. The present invention provides a solu- 
tion to these problems. 

The present invention describes, in effect, DNA mol- 
ecules which can be used in gene therapy, having greatly 
improved genetic purity and impressive properties of bio- 

40 availability. The invention also describes an especially effec- 
tive method for the preparation of these molecules and for 
their purification. 

The present invention lies, in particular, in the develop- 
ment of DNA molecules which can be used in gene therapy, 

45 virtually lacking any non-therapeutic region. The DNA 
molecules according to the invention, also designated 
minicircles on account of their circular structure, their small 
size and their supercoiled form, display many advantages. 
They make it possible, in the first place, to eliminate the 

50 risks associated with dissemination of the plasmid, such as 
(1) replication and dissemination which may lead to an 
uncontrolled overexpression of the therapeutic gene, (2) the 
dissemination and expression of resistance genes, and (3) 
the expression of genes present in the non-therapeutic por- 

55 tion of the plasmid, which are potentially immunogenic 
and/or inflammatory, and the like. The genetic information 
contained in the DNA molecules according to the invention 
is limited, in effect, essentially to the therapeutic gene(s) and 
to the signals for regulation of its/their expression (neither 

60 origin of replication, nor gene for resistance to an antibiotic, 
and the like). The probability of these molecules (and hence 
of the genetic information they contain) being transferred to 
a microorganism and being stably maintained is almost zero. 
Furthermore, due to their small size, DNA molecules 

65 according to the invention potentially have better bioavail- 
ability in vivo. In particular, they display improved capaci- 
ties for cell penetration and cellular distribution. Thus, it is 
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recognized that the coefficient of diffusion in the tissues is recombination, positioned in the direct orientation. The 
inversely proportional to the molecular weight (Jain, 1987). position in the direct orientation indicates that the two 
Similarly, at cellular level, high molecular weight molecules sequences follow the same 5'-3' polarity in the recombinant 
have inferior permeability through the plasma membrane. In DNA according to the invention. The genetic constructions 
addition, for the plasmid to progress to the nucleus, which is 5 of the invention can be double-stranded DNA fragments 
essential for its expression, high molecular weight is also a (cassettes) essentially composed of the elements mentioned 
drawback, the nuclear pores imposing a size limit for above - These cassettes can be used for the construction of 
diffusion to the nucleus (Landford et al., 1986). The elimi- cel1 hosts havln S these elemen,s integrated m their genome 
nation of the non-therapeutic portions of the plasmid (origin (FIG. 1). The genetic constructions of the invention can also 
of replication and resistance gene in particular) according to 10 be plasmids, that is to say any linear or circular DNA 
the invention also enables the size of the DNA molecules to molecule capable of replicating m a given host cell, con- 
be decreased. This decrease may be estimated at a factor of tainin S the S ene or 8 enes of interest flanked by the two 
2, reckoning, for example, 3 kb for the origin of replication sequences permitting site-specific recombination, positioned 
and the resistance marker (vector portion) and 3 kb for the ln th <; <» rect orientation. The construction can be, more 
transgene with the sequences required for its expression, is specifically a vector (such as a c oning and/or expression 
This decrease (i) in molecular weight and (ii) in negative vector )> a P ha S e > a virus > and the llke - These P lasmids of ' he 
charge endows the molecules of the invention with invention may be used to transform any competent cell host 
improved capacities for tissue, cellular and nuclear diffusion for the P ur P ose of lh r e production of mimcircles by rephca- 
and bioavailability. tl0n of the P lasmid followed by excision of the mimcircle 

Hence a first subject of the invention lies in a double- 20 (FIG. 2). ,• • 

stranded DNA molecule having the following features: it is In this connection, another subject of the invention lies in 

circular in shape and essentially comprises one or more a recombinant DNA comprising one or more genes of 

genes of interest. As stated above, the molecules of the interest > flanked by two sequences permitting site-specific 

invention essentially lack non-therapeutic regions, and espe- recombination positioned in the direct orientation, 

cially an origin of replication and/or a marker gene. In 25 Jte recombinant DNA according to the invention is 

addition, they are advantageously in supercoiled form. preferably a plasmid comprising at least: 

The present invention is also the outcome for the devel- a ) an on gm of replication and optionally a marker gene, 

opment of a method, of constructions and of cell hosts which b) two sequences permitting a site-specific recombination, 

are specific and especially effective for the production of positioned in the direct orientation, and, 

these therapeutic DNA molecules. More especially, the 30 c) placed between said sequences b), one or more genes 

method according to the invention lies in the production of of interest. 

therapeutic DNA molecules defined above, by excision from The specific recombination system present in the genetic 

a plasmid or from a chromosome by site-specific recombi- constructions according to the invention can be of different 

nation. The method according to the invention is especially origins. In particular, the specific sequences and the recom- 

advantageous, since it does not necessitate a prior step of 35 binases used can belong to different structural classes, and in 

purification of the plasmid, is very specific, especially particular to the integrase family of bacteriophage X. or to the 

effective, does not decrease the amounts of DNA produced resolvase family of the transposon Tn3. 

and leads directly to therapeutic molecules of very great Among recombinases belonging to the integrase family of 

genetic purity and of great bioavailability. This method bacteriophage X, there may be mentioned, in particular, the 

leads, in effect, to the generation of circular DNA molecules 40 integrase of the phages lambda (Landy et al., Science 197 

(minicircles) essentially containing the gene of interest and (1977) 1147), P22 and <J>80 (Leong et al., J. Biol. Chem. 260 

the regulator sequences permitting its expression in the cells, (1985) 4468), HP1 of Haemophilus influenza (Hauser et al, 

tissue, organ or apparatus, or even the whole body, in which J. Biol. Chem. 267 (1992) 6859), the Cre integrase of phage 

the expression is desired. In addition, these molecules may PI, the integrase of the plasmid pSAM2 (EP 350,341) or 

then be purified by standard techniques. 45 alternatively the FLP recombinase of the 2/< plasmid. When 

The site-specific recombination may be carried out by the DNA molecules according to the invention are prepared 

means of various systems which lead to site-specific recom- by recombination by means of a site-specific system of the 

bination between sequences. More preferably, the site- integrase family of bacteriophage lambda, the DNA mol- 

specific recombination in the method of the invention is ecules according to the invention generally comprise, in 

obtained by means of two specific sequences which are 50 addition, a sequence resulting from the recombination 

capable of recombining with one another in the presence of between two att attachment sequences of the corresponding 

a specific protein, generally designated recombinase. For bacteriophage or plasmid. 

this reason, the DNA molecules according to the invention Among recombinases belonging to the family of the 

generally comprise, in addition, a sequence resulting from transposon Tn3, there may be mentioned, in particular, the 

this site-specific recombination. The sequences permitting 55 resolvase of the transposon Tn3 or of the transposons Tn21 

the recombination used in the context of the invention and Tn522 (Stark et al., 1992); the Gin invertase of bacte- 

generally comprise from 5 to 100 base pairs, and more riophage mu or alternatively the resolvase of plasmids, such 

preferably fewer than 50 base pairs. as that of the par fragment of RP4 (Albert et al., Mol. 

The site-specific recombination may be carried out in Microbiol. 12 (1994) 131). When the DNA molecules 

vivo (that is to say in the host cell) or in vitro (that is to say 60 according to the invention are prepared by recombination by 

on a plasmid preparation). means of a site-specific system of the family of the trans- 

In this connection, the present invention also provides poson Tn3, the DNA molecules according to the invention 

particular genetic constructions suitable for the production generally comprise, in addition, a sequence resulting from 

of the therapeutic DNA molecules defined above. These the recombination between two recognition sequences of the 

genetic constructions, or recombinant DNAs, according to 65 resolvase of the transposon in question, 

the invention comprise, in particular, the gene or genes of According to a particular embodiment, in the genetic 

interest flanked by the two sequences permitting site-specific constructions of the present invention, the sequences per- 
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mitting site-specific recombination are derived from a bac- recognition sequences of the resolvase of a transposon, or 

teriophage. More preferably, these latter are attachment derived sequences. By way of preferred examples, there may 

sequences (attP and attB sequences) of a bacteriophage, or be mentioned, in particular, the recognition sequences of the 

derived sequences. These sequences are capable of recom- transposons Tn3, Tn21 and Tn522. By way of a preferred 

bining specifically with one another in the presence of a 5 example, there may be mentioned the sequence SEQ ID No. 

recombinase designated integrase. The term derived 15 or a derivative of the latter (see also Sherrat, P. 163-184, 

sequence includes the sequences obtained by Mobile DNA, Ed. D. Berg and M. Howe, American Society 

modification(s) of the attachment sequences of the for Microbiology, Washington D.C. 1989). 

bacteriophages, which retain the capacity to recombine According to another especially advantageous variant, the 

specifically in the presence of the appropriate recombinase. 10 plasmids of the invention comprise, in addition, a multimer 

Thus, such sequences can be reduced fragments of these resolution sequence. This is preferably the mrs (multimer 

sequences or, on the contrary, fragments extended by the resolution system) sequence of the plasmid RK2. More 

IJ -' ; -n of other sequences (restriction sites, and the like). pre ferably, the invention relates to a plasmid comprising: 



They can also be variants obtained by mutation(s), 
particular by point mutation(s). The terms attP and attB 3 
sequences of a bacteriophage or of a plasmid denote, accord- 
ing to the invention, the sequences of the recombination 
system specific to said bacteriophage or plasmid, that is to 
say the attP sequence present in said phage or plasmid and 
e corresponding chromosomal attB sequence. 



(a) a bacterial origin of replication and optionally < 
marker gene, 

(b) the attP and attB sequences of a bacteriophage, in the 
direct orientation, selected from the phages lambda, 
P22, <I>80, HP1 and PI or of plasmid pSAM2 or the 2u 
plasmid, or derived sequences; and, 



By way of preferred examples, there may be mentioned, ( c ) P laced between said sequences b), one or more genes 

in particular, the attachment sequences of the phages of interesl and the mrs sequence of plasmid RK2. 

lambda, P22, <580, PI and HP1 of Haemophilus influenzae This embodiment is especially advantageous. Thus, when 

or alternatively of plasmid pSAM2 or the 2u plasmid. These plasmids pXL2649 or pXL2650 are brought into contact 

sequences are advantageously chosen from all or part of the 25 vrath the integrase of the bacteriophage in vivo, the 

sequences SEQ ID No. 1, SEQ ID No. 2, SEQ ID No. 6, sequences recombine to generate the mimcircle and the 

SEQ ID No. 7, SEQ ID No. 8, SEQ ID No. 9, SEQ ID No. miniplasmid, but also multimeric or topological forms of 

10, SEQ ID No. 11, SEQ ID No. 12, SEQ ID No. 13 and minicircle or of miniplasmid. It is especially advantageous 

SEQ ID No. 14. These sequences comprise, in particular, the t0 be able t0 decrease the concentration of these forms in 

central region homologous to the attachment sequences of 30 order 10 increase the production and facilitate the purifica- 

these phages. tion of minicircle. 

In this connection, a preferred plasmid according to the The multimeric forms of plasmids are known to a person 

present invention comprises " skilled in lhe art - For example, the cer fragment of ColEl 

(a) a bacterial origin of replication and optionally a (Summerset al 1984 Cell 36 p. 1097) or the mrs site of the 
marker gene par locus of RK2 (L. Ebert 1994 Mol. Microbiol. 2 p. 131) 

(b) the attP and attB sequences of a bacteriophage selected * P ermit the "solution of multimers of plasmids and partici- 
• from the phages lambda, P22, O80, HP1 and PI or of ?f 1D an e " hanced f s ' ablht y °* the P lasm f How *? r ' 

i -i cun ,i n , • , , • , whereas resolution at the cer site requires four proteins 

plasmid pSAM2 or the 2a plasmid, or derived . , , _ ,. , . T 

sequences and encoded by the E. coh genome (Colloms et al., 1990 J. 

' , ^ Bacteriol. 172 p. 6973), resolution at the mrs site requires 

(c) placed between Sal d sequences b), one or more genes 40 only ^ p arAp ^ otein 4 which the par Agene is mapped on 
of interest. (he locus of RR2 ^ a regul i{ wou]d appear advan . 

According to an especially preferred embodiment the ^ tQ ^ ^ Qr a ion of ^ loc ^ conlaijyi 

sequences in question are the attachment sequences (attP and and ^ mrg ^ Fof ex ^ mrg ^ 

attB) of phage ambda. Plasmids carrying these sequences b j d be m B d altP ncesof h 

a v;o«n Pa r la V he f^f PX , L264 ^' PXL2649 ° f 45 lambda, and the parA gene be expressed in trans or hi cis 

pXL2650. When these plasmids are brought, in viyo or m from ^ Qwn otef of frorn an ter . In this 

vitro, into contact with the integrase of phage lambda, the connecti a ticu]ar lasmid of the invention compri ses: 

sequences recombine with one another to generate in vivo or , s , . , . . r ,. . , . „ 

in vitro, by excision, a minicircle according to the invention ( a ) a bactenal on g in of ^plication and optionally a 

essentially comprising the elements (c), that is to say the 50 marker gene, 

therapeutic portion (FIG. 2). ( b ) the attP and attB sequences of a bacteriophage, m the 

Still according to a particular embodiment of the direct orientation, selected from the phages lambda, 

invention, the sequences permitting site-specific recombina- P22 > «> 80 > HP1 and P1 or of P las mid P SAM2 or the 2u 

tion are derived from the loxP region of phage PI. This plasmid, or derived sequences, 

region is composed essentially of two inverted repeat 55 (c) placed between said sequences b), one or more genes 

sequences capable of recombining specifically with one of interest and the mrs sequence of plasmid RK2, and 

another in the presence of a protein, designated Cre (d) the parA gene of plasmid RK2. 

(Sternberg et al., J. Mol. Biol. 150 (1971) 467). In a One such plasmid is, in particular, the plasmid pXL2960 

particular variant, the invention hence relates to a plasmid described in the examples. It may be employed, and can 

comprising (a) a bacterial origin of replication and option- 60 enable minicircle to be produced exclusively ir 

ally a marker gene; (b) the inverted repeat sequences of form. 

bacteriophage PI (loxP region); and (c), placed between said According to another advantageous variant, the p. 

sequences (b), one or more genes of interest. of the invention comprise two sets of site-specific recombi- 

According to another particular embodiment, in the nation sequences from a different family. These advanta- 

genetic constructions of the present invention, the sequences 65 geously comprise a first set of integrase-dependent 

permitting site-specific recombination are derived from a sequences and a second set of parA-dependent sequences, 

transposon. More preferably, the sequences in question are The use of two sets of sequences enables the production 
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yields of minicircles to be increased when the first site- contact with the recombinase in vitro, on a plasmid 
specific recombination is incomplete. Thus, when plasmids preparation, by direct incubation with the protein. 
pXL2650 or pXL2960 are brought into contact with the It is preferable, in the context of the present invention, to 
integrase of the bacteriophage in vivo, the sequences recom- use a host cell capable of expressing the recombinase gene 
bine to generate the miniplasmid and the minicircle, but this s in a regulated manner. This embodiment, in which the 
reaction is not complete (5 to 10% of initial plasmid may be recombinase is supplied directly by the host cell after 
left). The introduction, in proximity to each of the att induction, is especially advantageous. In effect, it suffices 
sequences of phage lambda, of an mrs sequence of RK2 simply to place the cells in culture at the desired time under 
enables the production of minicircles to be increased. Thus, the conditions for expression of the recombinase gene 
after induction of the integrase of phage lambda and Int- 10 (permissive temperature for a temperature-sensitive gene, 
dependent recombination, the unrecombined molecules will addition of an inducer for a regulable promoter, and the like) 
be able to come under the control of the ParA protein of RK2 in order to induce the site-specific recombination in vivo and 
and to recombine at the mrs sites. Conversely, after indue- thus the excision of the minicircle of the invention. In 
tion of the ParA protein and ParA-dependent recombination, addition, this excision takes place in especially high yields, 
the unrecombined molecules will be able to come under the 15 since all the cells in culture express the recombinase, which 
control of the integrase of phage lambda and will be able to is not necessarily the case if a transfection or an infection has 
recombine at the att sites. Such constructions thus make it to be carried out in order to transfer the recombinase gene, 
possible to produce minicircle and negligible amounts of According to a first embodiment, the method of the 
unrecombined molecules. The att sequences, like the mrs invention comprises the excision of the molecules of thera- 
sequences, are in the direct orientation, and the int and parA 20 peutic DNA by site-specific recombination from a plasmid. 
genes may be induced simultaneously or successively from This embodiment employs the plasmids described above 
the same inducible promoter or from two inducible promot- permitting, in a first stage, replication in a chosen host, and 
ers. Preferably, the sequences in question are the attB and then, in a second stage, the excision of the non-therapeutic 
attP attachment sequences of phage lambda in the direct portions of said plasmid (in particular the origin of replica- 
orientation and two mrs sequences of RK2 in the direct 25 tion and the resistance gene) by site-specific recombination, 
orientation. generating the circular DNA molecules of the invention. To 
As stated above, another aspect of the present invention carry out the method, different types of plasmid may be 
lies in a method for the production of therapeutic DNA used, and especially a vector, a phage or a virus. A replica- 
molecules defined above, by excision, from a plasmid or tive vector is preferably used. 

chromosome, by site-specific recombination. 30 Advantageously, the method of the invention comprises a 

Another subject of the present invention hence lies in a prior step of transformation of host cells with a plasmid as 

method for the production of a DNA molecule (minicircle) defined above, followed by culturing of the transformed 

as defined above, according to which a culture of host cells cells, enabling suitable amounts of plasmid to be obtained, 

containing a recombinant DNA as defined above is brought Excision by site-specific recombinations is then carried out 

into contact with the recombinase enabling site-specific 35 by bringing into contact with the recombinase under the 

recombination to be induced. More preferably, the culture conditions defined above (FIG. 2). As stated above, in this 

and recombinase are brought into contact either by trans- embodiment, the site-specific recombination may be carried 

fection or infection with a plasmid or a phage containing the out in vivo, (that is to say in the host cell) or in vitro (that 

gene for said recombinase; or by induction of the expression is to say on a plasmid preparation), 

of a gene coding for said recombinase, present in the host 40 According to a preferred embodiment, the DNA mol- 

cell. As mentioned below, this gene may be present in the ecules of the invention are hence obtained from a replicative 

host cell in integrated form in the genome, on a replicative vector, by excision of the non-therapeutic portion carrying, 

plasmid or alternatively on the plasmid of the invention, in in particular, the origin of replication and the marker gene, 

the non-therapeutic portion. by site-specific recombination. 

To permit the production of the minicircles according to 45 According to another embodiment, the method of the 

the invention by site-specific recombination in vivo, the invention comprises the excision of the DNA molecules 

recombinase used must be introduced into, or induced in, from the genome of the host cell by site-specific recombi- 

cells or the culture medium at a particular instant. For this nation. This embodiment is based more especially on the 

purpose, different methods may be used. According to a first construction of cell hosts comprising, inserted into their 

method, a host cell is used containing the recombinase gene 50 genome, one or more copies of a cassette comprising the 

in a form permitting its regulated expression. It may, in gene of interest flanked by the sequences permitting recom- 

particular, be introduced under the control of a promoter or bination (FIG. 1). Different techniques may be used for 

of a system of inducible promoters, or alternatively in a insertion of the cassette of the invention into the genome of 

temperature-sensitive system. In particular, the gene may be the host cell. In particular, insertion at several distinct points 

present in a temperature-sensitive phage, latent during the 55 of the genome may be obtained by using integrative vectors, 

growth phase, and induced at a suitable temperature (for In this connection, different transposition systems such as, in 

example lysogenic phage lambda Xis" cI857). The cassette particular, the miniMu system or defective transposons such 

for expression of the recombinase gene may be carried by a as TnlO derivatives, for example, may be used (Kleckner et 

plasmid, a phage or even by the plasmid of the invention, in al., Methods Enzymol. 204 (1991) 139; Groisman E., Meth- 

the non-therapeutic region. It may be integrated in the 60 ods Enzymol. 204 (1991) 180). The insertion may also be 

genome of the host cell or maintained in replicative form. carried out by homologous recombination, enabling a cas- 

According to another method, the cassette for expression of sette containing two recombination sequences in the direct 

the gene is carried by a plasmid or a phage used to transfect orientation flanking one or more genes of interest to be 

or infect the cell culture after the growth phase. In this case, integrated in the genome of the bacterium. This process may, 

it is not necessary for the gene to be in a form permitting its 65 in addition, be reproduced as many times as desired so as to 

regulated expression. In particular, any constitutive pro- have the largest possible number of copies per cell. Another 

moter may be used. The cell may also be brought into technique also consists in using an in vivo amplification 



system using recombination, as described in Labarre et al. 
(Labarre J., O. Reyes, Guyonvarch, and G. Leblon. 1993. 
Gene replacement, integration, and amplification at the 
gdhA locus of Corynebacterium glutamicum. J. Bacterid. 
175:1001-107), so as to augment from one copy of the 5 
cassette to a much larger number. 

A preferred technique consists in the use of miniMu. To 
this end, miniMu derivatives are constructed comprising a 
resistance marker, the functions required in cis for their 
transposition and a cassette containing two recombination 10 
sequences in the direct orientation flanking the gene or genes 
of interest. These miniMus are advantageously placed at 
several points of the genome using a resistance marker 
(kanamycin, for example) enabling several copies per 
genome to be selected (Groisman E. cited above). As 15 
described above, the host cell in question can also express 
inducibly a site-specific recombinase leading to the excision 
of the fragment flanked by the recombination sequences in 
the direct orientation. After excision, the minicircles may be 
purified by standard techniques. 20 

This embodiment of the method of the invention is 
especially advantageous, since it leads to the generation of 
a single type of plasmid molecule: the minicircle of the 
invention. The cells do not contain, in effect, any other 
episomal plasmid, as is the case during production from a 25 
plasmid (FIGS. 1 and 2). 

Another subject of the invention also lies in a modified 
host cell comprising, inserted into its genome, one or more 
copies of a recombinant DNA as defined above. 

The invention also relates to any recombinant cell con- 30 
taining a plasmid as defined above. These cells are obtained 
by any technique known to a person skilled in the art 
enabling a DNA to be introduced into a given cell. Such a 
technique can be, in particular, transformation, 
electroporation, conjugation, protoplast fusion or any other 35 
technique known to a person skilled in the art. As regards 
transformation, different protocols have been described in 
the prior art. In particular, cell transformation may be carried 
out by treating whole cells in the presence of lithium acetate 
and polyethylene glycol according to the technique 40 
described by Ito et al. (J. Bacteriol. 153 (1983) 163-168), or 
in the presence of ethylene glycol and dimethyl sulphoxide 
according to the technique of Durrens et al. (Curr. Genet. 18 
(1990) 7). An alternative protocol has also been described in 
Patent Application EP 361,991. As regards electroporation, 45 
this may be carried out according to Becker and Guarentte 
(in: Methods in Enzymology Voll94 (1991) 182). 

The method according to the invention may be carried out 
in any type of cell host. Such hosts can be, in particular, 
■ bacteria or eukaryotic cells (yeasts, animal cells, plant cells), 50 
and the like. Among bacteria, E.coli, B. subtilis, 
Streptomyces, Pseudomonas (P. putida, P. aeruginosa), 
Rhizobium meliloti, Agrobacterium tumefaciens, Staphylo- 
coccus aureus, Streptomyces pristinaespiralis, Enterococcus 
faecium or Clostridium, and the like, may be mentioned 55 
more preferentially. Among bacteria, it is preferable to use 
E.coli. Among yeasts, Kluyveromyces, Saccharomyces, 
Pichia, Hansenula, and the like, may be mentioned. Among 
mammalian animal cells, CHO, COS, NIH3T3, and the like, 
cells may be mentioned. 60 

In accordance with the host used, the plasmid according 
to the invention is adapted by a person skilled in the art to 
permit its replication. In particular, the origin of replication 
and the marker gene are chosen in accordance with the host 
cell selected. 65 

The marker gene may be a resistance gene, in particular 
for resistance to an antibiotic (ampicillin, kanamycin, 



10 

geneticin, hygromycin, and the like), or any gene endowing 
the cell with a function which it no longer possesses (for 
example a gene which has been deleted on the chromosome 
or rendered inactive), the gene on the plasmid reestablishing 
this function. 

In a particular embodiment, the method of the invention 
comprises an additional step of purification of the 
minicircle. 

In this connection, the minicircle may be purified by 
standard techniques of plasmid DNA purification, since it is 
supercoiled like plasmid DNA. These techniques comprise, 
inter alia, purification on a cesium chloride density gradient 
in the presence of ethidium bromide, or alternatively the use 
of anion exchange columns (Maniatis et al., 1989). In 
addition, if the plasmid DNA corresponding to the non- 
therapeutic portions (origin of replication and selectable 
marker in particular) is considered to be present in an 
excessively large amount, it is also possible, after or before 
the purification, to use one or more restriction enzymes 
which will digest the plasmid and not the minicircle, 
enabling them to be separated by techniques that separate 
supercoiled DNA from linear DNA, such as a cesium 
chloride density gradient in the presence of ethidium bro- 
mide (Maniatis et al., 1989). 

In addition, the present invention also describes an 
improved method for the purification of minicircles. This 
method enables minicircles of very great purity to be 
obtained in large yields in a single step. This improved 
method is based on the interaction between a double- 
stranded sequence present in the minicircle and a specific 
ligand. The ligand can be of various natures, and in particu- 
lar protein, chemical or nucleic acid in nature. It is prefer- 
ably a ligand of the nucleic acid type, and in particular an 
oligonucleotide, optionally chemically modified, capable of 
forming by hybridization a triple helix with the specific 
sequence present in the DNA molecule of the invention. It 
was, in effect, shown that some oligonucleotides were 
capable of specifically forming triple helices with double- 
stranded DNA sequences (Helene et al., Biochim. Biophys. 
Acta 1049 (1990) 99; see also FR 94/15162 incorporated in 
the present application by reference). 

In an especially advantageous variant, the DNA mol- 
ecules of the invention hence contain, in addition, a 
sequence capable of interacting specifically with a ligand 
(FIG. 3). Preferably, it is a sequence capable of forming, by 
hybridization, a triple helix with a specific oligonucleotide. 
This sequence may be positioned at any site of the DNA 
molecule of the invention, provided it does not affect the 
functionality of the gene of interest. This sequence is also 
present in the genetic constructions of the invention 
(plasmids, cassettes), in the portion containing the gene of 
interest (see, in particular, the plasmid pXL2650). 
Preferably, the specific sequence present in the DNA mol- 
ecule of the invention comprises between 5 and 30 base 
pairs. 

The oligonucleotides used for carrying out the method 
according to the invention can contain the following bases: 
thymidine (T), which is capable of forming triplets with 

AT doublets of double-stranded DNA (Rajagopal et 

al., Biochem 28 (1989) 7859); 
adenine (A), which is capable of forming triplets with AT 

doublets of double-stranded DNA; 
guanine (G), which is capable of forming triplets with 

G.C doublets of doubled-stranded DNA; 
protonated cytosine (C+), which is capable of forming 

triplets with G.C doublets of doubled-stranded DNA 

(Rajagopal et al., cited above). 
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Preferably, the oligonucleotide used comprises a homopy- 
rimidine sequence containing cytosines, and the specific 
sequence present in the DNA molecule is a homopurine- 
homopyrimidine sequence. The presence of cytosines makes 
it possible to have a triple helix which is stable at acid pH 
where the cytosines are protonated, and destabilized at 
alkaline pH where the cytosines are neutralized. 

To permit the formation of a triple helix by hybridization, 
it is important for the oligonucleotide and the specific 
sequence present in the DNA molecule of the invention to be 
complementary. In this connection, to obtain the best yields 
and best selectivity, an oligonucleotide and a specific 
sequence which are fully complementary are used in the 
method of the invention. Possible combinations are, in 
particular, a poly(CTT) oligonucleotide and a poly(GAA) 
specific sequence. By way of example, there may be men- 
tioned the oligonucleotide of sequence GAGGCTTCTTCT- 
TCTTCTTCTTCTT (SEQ ID No. 5), in which the bases 
GAGG do not form a triple helix but enable the oligonucle- 
otide to be spaced apart from the coupling arm. 

It is understood, however, that some mismatches may be 
tolerated, provided they do not lead to too great a loss of 
affinity. The oligonucleotide used may be natural (composed 
of unmodified natural bases) or chemically modified. In 
particular, the oligonucleotide may advantageously possess 



To permit its covalent coupling to the support, the ligand 
is generally functionalized. In the case of an oligonucleotide, 
this may be modified, for example, with a terminal thiol, 
amine or carboxyl group at the 5' or 3' position. In particular, 
5 the addition of a thiol, amine or carboxyl group makes it 
possible, for example, to couple the oligonucleotide to a 
support carrying disulphide, maleimide, amine, carboxyl, 
ester, epoxide, cyanogen bromide or aldehyde functions. 
These couplings form by the establishment of disulphide, 
10 thioether, ester, amide or amine links between the oligo- 
nucleotide and the support. Any other method known to a 
person skilled in the art may be used, such as bifunctional 
coupling reagents, for example. 

Moreover, to improve the activity of the coupled 
15 oligonucleotide, it may be advantageous to perform the 
coupling by means of an "arm". Use of an arm makes it 
possible, in effect, to bind the oligonucleotide at a chosen 
distance from the support, enabling its conditions of inter- 
action with the DNA molecule of the invention to be 
20 improved. The arm advantageously consists of nucleotide 
bases that do not interfere with the hybridization. Thus, the 
arm may comprise purine bases. By way of example, the arm 
may comprise the sequence GAGG. 

The DNA molecules according to the invention may be 



some chemical modifications enabling its resistance or its 25 used in any application of vaccination or of gene and cell 



protection against nucleases, or its affinity for the specific 
sequence, to be increased. 

Thus, the oligonucleotide may be rendered n 
to nucleases by modification of the skeleton (e.g. 
methylphosphonates, phosphorothiates, phosphotriester, : 
phosphoramidate, and the like). Another type of modifica- 
tion has as its objective, more especially, to improve the 
interaction and/or the affinity between the oligonucleotide 
and the specific sequence. In particular, a thoroughly advan- 



therapy, for the transfer of a gene to a body, a tissue or a 
given cell. In particular, they may be used for a direct 
administration in vivo, or for the modification of cells in 
vitro or ex vivo with a view to their implantation in a patient. 
In this connection, the molecules according to the invention 
may be used as they are (in the form of naked DNA), or in 
combination with different synthetic or natural, chemical 
and/or biochemical vectors. The latter can be, in particular, 
> (calcium phosphate, DEAE-dextran, etc.) which act 



tageous modification according to the invention consists in 35 by forming precipitates with DNA, which precipitates c 

methylating the cytosines of the oligonucleotide. The oli- be "phagocytosed" by the cells. They can also be liposomes 

gonucleotide thus methylated displays the noteworthy prop- in which the DNA molecule is incorporated and which fuse 

erty of forming a stable triple helix with the specific with the plasma membrane. Synthetic gene transfer vectors 

sequence at neutral pH. Hence it makes it possible to work are generally lipids or cationic polymers which complex 

at higher pH values than the oligonucleotides of the prior art, 40 DNA and form a particle therewith carrying positive surface 

that is to say at pH values where the risks of degradation of charges. These particles are capable of interacting with the 

the plasmid DNA are lower. negative charges of the cell membrane and then of crossing 

The length of the oligonucleotide used in the method of the latter. DOGS (Transfectam™) or DOTMA 

the invention is at least 3 bases, and preferably between 5 (Lipofectin™) may be mentioned as examples of such 

and 30. An oligonucleotide of length greater than 10 bases 45 vectors. Chimeric proteins have also been developed: they 

is advantageously used. The length may be adapted to each consist of a polycationic portion which condenses DNA, 

individual case by a person skilled in the art in accordance linked to a ligand which binds to a membrane receptor and 

with the desired selectivity and stability of the interaction. carries the complex into the cells by endocytosis. The DNA 

The oligonucleotides according to the invention may be molecules according to the invention may also be used for 

synthesized by any known technique. In particular, they may 50 gene transfer into cells by physical transfection techniques 

be prepared by means of nucleic acid synthesizers. It is quite such as bombardment, electroporation, and the like. In 



obvious that any other method known to a person skilled in 
the art may be used. 

To carry out the method of the invention, the specific 
ligand (protein, nucleic acid, and the like) may be grafted or 
otherwise onto a support. Different types of supports may be 
used for this purpose, such as, in particular, functionalized 
chromatography supports, in bulk form or prepacked in 
columns, functionalized plastic surfaces or functionalized 
latex beads, magnetic or otherwise. Chromatography sup- 
ports are preferably used. By way of example, the chroma- 
tography supports which may be used are agatose, acryla- 
mide or dextran, as well as their derivatives (such as 
Sephadex, Sepharose, Superose, etc.), polymers such as 
poly(styrenedivinylbenzene), or grafted or ungrafted silica, 
for example. The chromatography columns can function in 
the diffusion or perfusion mode. 



addition, prior to their therapeutic use, the molecules of the 
invention may optionally be linearized, for example by 
enzymatic cleavage. 
; In this connection, another subject of the present inven- 
tion relates to any pharmaceutical composition comprising 
at least one DNA molecule as defined above. This molecule 
may be naked or combined with a chemical and/or bio- 
chemical transfection vector. The pharmaceutical composi- 
) tions according to the invention may be formulated with a 
view to topical, oral, parenteral, intranasal, intravenous, 
intramuscular, subcutaneous, intra-ocular, transdermal, and 
the like, administration. Preferably, the DNA molecule is 
used in an injectable form or by application. It may be mixed 
; with any pharmaceutical^ acceptable vehicle for an inject- 
able formulation, in particular for a direct injection at the site 
to be treated. The compositions can be, in particular, in the 



6,143,530 

13 14 

form of isotonic sterile solutions, or of dry, in particular represses the transcription of a gene, specifically or 

lyophilized compositions which, on addition of sterilized otherwise, inducibly or otherwise, strongly or weakly. They 

water or physiological saline as appropriate, enable inject- can be, in particular, ubiquitous promoters (promoter of the 

able solutions to be made up. Diluted Tris or PBS buffers in HPRT, PGK, a-actin, tubulin, and the like, genes), promot- 

glucose or sodium chloride may be used in particular. A 5 ers of intermediate filaments (promoter of the GFAP, 

direct injection of the nucleic acid into the affected region of desmin, vimentin, neurofilament, keratin, and the like, 

the patient is advantageous, since it enables the therapeutic genes), promoters of therapeutic genes (for example the 

effect to be concentrated in the tissues affected. The doses of promoter of the MDR, CFTR, factor VIII, ApoAI, and the 

nucleic acid used may be adapted in accordance with like, genes), tissue-specific promoters (promoter of the pyru- 

different parameters, and in particular in accordance with the 10 vate kinase gene, villin gene, gene for intestinal fatty acid 

gene, the vector, the mode of administration used, the binding protein, gene for a-actin of smooth muscle, and the 

pathology in question or alternatively the desired treatment like) or alternatively promoters that respond to a stimulus 

period. (steroid hormone receptor, retinoic acid receptor, and the 

The DNA molecules of the invention may contain one or like). Similarly, the promoter sequences may be those origi- 

more genes of interest, that is to say one or more nucleic 15 nating from the genome of a virus, such as, for example, the 

acids (cDNA, gDNA, synthetic or semi-synthetic DNA, and promoters of the adenovirus E1A and MLP genes, the CM V 

the like) whose transcription and, where appropriate, trans- early promoter or alternatively the RSV LTR promoter, and 

lation in the target cell generate products of therapeutic, the like. In addition, these promoter regions may be modi- 

vaccinal, agricultural or veterinary value. fied by the addition of activator or regulator sequences or 

Among the genes of therapeutic value, there may be 20 sequences permitting a tissue-specific or -preponderant 

mentioned, more especially, the genes coding for enzymes, expression. 

blood derivatives, hormones, lymphokines, namely Moreover, the gene of interest can also contain a signal 
interleukins, interferons, TNF, and the like (FR 92/03120), sequence directing the synthesized product into the path- 
growth factors, neurotransmitters or their precursors or ways of secretion of the target cell. This signal sequence can 
synthetic enzymes, trophic factors, namely BDNF, CNTF, 25 be the natural signal sequence of the product synthesized, 
NGF, IGF, GMF, aFGF, bFGF, NT3, NTS, and the like; but it can also be any other functional signal sequence, or an 
apolipoproteins, namely ApoAI, ApoAIV, ApoE, and the artificial signal sequence. 

like (FR 93/05125), dystrophin or a minidystrophin (FR Depending on the gene of interest, the DNA molecules of 

91/11947), tumour suppressive genes, namely p53, Rb, the invention may be used for the treatment or prevention of 

RaplA, DCC, k-rev, and the like (FR 93/04745), genes 30 a large number of pathologies, including genetic disorders 

coding for factors involved in coagulation, namely factors (dystrophy, cystic fibrosis, and the like), neurodegenerative 

VII, VIII, IX, and the like, suicide genes, namely thymidine diseases (Alzheimer's, Parkinson's, ALS, and the like), 

kinase, cytosine deaminase, and the like; or alternatively all cancers, pathologies associated with disorders of coagula- 

or part of a natural or artificial immunoglobulin (Fab, ScFv, tion or with dyslipoproteinaemias, pathologies associated 

and the like), a ligand RNA (W091/19813), and the like. 35 with viral infections (hepatitis, AIDS, and the like), or in the 

The therapeutic gene can also be an antisense gene or agricultural and veterinary fields, and the like, 

sequence whose expression in the target cell enables gene The present invention will be described more completely 

expression or the transcription of cellular mRNAs to be by means of the examples which follow, which are to be 

controlled. Such sequences can, for example, be transcribed regarded as illustrative and non-limiting. 

in the target cell into RNAs complementary to cellular 40 ^ ^ y _ m m 

mRNAs, and can thus block their translation into protein, BRIEF DESCRIPTION OF THE FIGURES 

according to the technique described in Patent EP 140,308. fig. 1: Production of a minicircle from a cassette inte- 

The gene of interest can also be a vaccinating gene, that grated in the genome, 

is to say a gene coding for an antigenic peptide, capable of FIG 2: Production of a minicircle from a plasmid. 

generating an immune response in man or animals for the 45 pjQ 3. Production of a mini ci r cle containing a sequence 

purpose of vaccine production. Such antigenic peptides can spec i nc to a ligand 

be, in particular, those specific to the Epstein-Barr virus, the A ^ • r ^ T o^„^ ^ • ^ • • c 

HIV virus, the hepatitis B virus (EP 185,573) or the pseu- * Construction of pXL2649. On: Origin of reph- 

dorabies virus, or alternatively tumour-specific peptides (EP catl ° n ; Kan . Marker gene conferring resistance to kanamy- 

259 212) 50 cin; y ^ m P • Marker gene conferring resistance to ampicillrn; 

Generally, in the plasmids and molecules of the invention, f alK: G=»lactosidase gene of E.coli; Plac: Promoter of the 

the gene of therapeutic, vaccinal, agricultural or veterinary ac ose °P eron - 

value also contains a transcription promoter region which is FIG - 5; Luciferase activity obtained after transfection of 
functional in the target cell or body (i.e. mammals), as well NIH3T3 mouse fibroblasts with plasmid pXL2650, the 
as a region located at the 3' end and which specifies a 55 rmmcircle generated from plasmid pXL2650 and PGL2- 
transcription termination signal and a polyadenylation site Control (Promega, Biotech). The transfection was carried 
(expression cassette). As regards the promoter region, this ° ut under the following conditions: 0.5 mg of DNA per well, 
can be a promoter region naturally responsible for the 50,000 cells per well. The lipofectant used is RPR 115335. 
expression of the gene in question when the latter is capable The result is recorded in RLU per microgram of proteins as 
of functioning in the cell or body in question. The promoter 60 a function of the hpofectant/DNA charge ratio, 
regions can also be those of different origin (responsible for FIG. 6: Construction of the plasmid pXL2793. This 
the expression of other proteins, or even synthetic plasmid generates, after recombination, a minicircle con- 
promoters). In particular, the promoter sequences can be taining a synthetic homopurine-homopyrimidine sequence 
from eukaryotic or viral genes. For example, they can be and the luciferase cassette of pXL2727. 
promoter sequences originating from the genome of the 65 FIG. 7: Well 1 corresponds to the Sail digestion of the 
target cell. Among eukaryotic promoters, it is possible to use fraction eluted after purification with a triple-helix column, 
any promoter or derived sequence that stimulates or Well 2 corresponds to the XmnI digestion of the fraction 
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eluted after purification with a triple-helix column. Well 3 
corresponds to the undigested fraction eluted after purifica- . 
tion with a triple-helix column. Well 4 corresponds to 
uninduced, undigested plasmid pXL2793. Wells 5 and 6 
correspond, respectively, to the linear DNA and supercoiled 5 
DNA size markers. 

FIG. 8: Diagrammatic description of the construction of 
the plasmid pXL2776. 

FIG. 9: Diagrammatic description of the constructions of 
the plasmids pXL2777 and pXL2960. 10 

FIG. 10: Action of the integrase of bacteriophage 1 in E. 
coli on plasmids pXL2777 and pXL2960. M: linear DNA or 
supercoiled DNA 1 kb molecular weight marker. N.I.: not 
induced. I: induced. N.D.: not digested. J5 

FIG. 11: Kinetics of recombination of the integrase of 
bacteriophage 1 in E. coli on plasmids pXL2777 and 
pXL2960. 2': 2 minutes. O/N: 14 hours. M: linear DNA or 
supercoiled DNA 1 kg molecular weight marker. N.I.: not 
induced. I: induced. N.D.: not digested. 2 o 

General techniques of cloning and molecular biology. 

The standard methods of molecular biology, such as 
centrifugation of plasmid DNA in a cesium chloride- 
ethidium bromide gradient, digestion with restriction 
enzymes, gel electrophoresis, electroelution of DNA frag- 
ments from agarose gels, transformation in E.coli, precipi- 
tation of nucleic acids, and the like, are described in the 
literature (Maniatis et al., 1989, Ausubel et al., 1987). 
Nucleotide sequences were determined by the chain termi- 
nation method according to the protocol already put forward 
(Ausubel et al., 1987). 

Restriction enzymes were supplied by New-England 
Biolabs (Biolabs), Bethesda Research Laboratories (BRL) 
or Amersham Ltd. (Amersham). 

To carry out ligation, DNA fragments are separated 
according to their size on 0.7 % agarose or 8% acrylamide 
gels, purified by electrophoresis and then electroelution, 
extracted with phenol, precipitated with ethanol and then 
incubated in a buffer comprising 50 mM Tris-HCl, pH 7.4, 
10 mM MgCl 2 , 10 mM, DTT, 2 mM ATP in the presence of 
phage T4 DNA ligase (Biolabs). Oligo-nucleotides are syn- 
thesized using phosphoramidite chemistry with the latter 
derivatives protected at the b position by a cyanoethyl group 



dNTP: 2'-deoxyribonucleoside 5'-triphosphates 
DTT: dithiothreitol 
kb: kilobases 
bp: base pairs 

EXAMPLE 1 

Construction of a Plasmid Carrying the attP and attB 
Sequences of the Bacteriophage, in Repeated Direct Orien- 
tations. 

10 The plasmid pNH16a was used as starting material, 
inasmuch as it already contains a fragment of bacteriophage 
X carrying the attP sequence (Hasan and Szybalski, 1987). 
This plasmid was digested with EcoRI. Oligonucleotides 
which contain the attB sequence (Landy, 1989) were syn- 
15 thesized. They have the following sequence: 

Oligonucleotide 5476 (SEQ ID No.l) 
5'-AATTGTGAAGCCTGCTTTTTTATACTAAC 
TTGAGCGG-3' 
2 o Oligonucleotide 5477 (SEQ ID No. 2) 
5'-AATTCCGCTCAAGTTAGTATAAAAAAGCA 
GGCTTCAC-3 , 
They were hybridized to re-form the attB sequence and 
then ligated at the EcoRI site of the 4.2-kb EcoRI fragment 
25 of pNH16a (Hasan and Szybalski, 1987). After transforma- 
tion of DH5ct, a recombinant clone was retained. The 
plasmid thereby constructed was designated pXL2648 (see 
FIG. 4). This plasmid contains the attP and attB sequences 
of the bacteriophage in the direct orientation. Under the 
30 action of the integrase of the bacteriophage (Int protein), 
there should be excision of the sequences lying between the 
two att sites. This results in separation of the material 
inserted between the two att sequences from the origin of 
replication and from the resistance marker of the plasmid, 
35 which are positioned on the outside. 

EXAMPLE 2 
Obtaining a Minicircle in vivo in E.coli. 
A cassette for resistance to kanamycin was cloned at the 
40 EcoRI site of plasmid pXL2648 (FIG. 4). This cassette 
originates from the plasmid pUC4KIXX (Pharmacia 
Biotech.). For this purpose, 10 fig of plasmid pUC4KIXX 
were digested with EcoRI and then separated by agarose gel 
electrophoresis; the 1.6-kb fragment containing the kana- 



(Sinha et al, 1984, Giles 1985), with the Biosearch 8600 4J mycin resistance marker was purified by electro-elution; it 



automatic DNA synthesizer, using the manufacturer's rec- 
ommendations. 

The ligated DNAs are used to transform the following 
strains rendered competentf: E.coli MC1060 
[(LacIOPZYA)X74, galU, galK, strA r , hsdR] (Casadaban et 50 
al., 1983); HB101 [hsdS20, supE44, recA13, ara-14, proA2, 
lacYl, galK2, rpsL20, xyl-5, mtl-1, F-] (Maniatis et al., 
1989); and DH5a [endAl hsdR17 supE44 thi-1 recAl 
gyrA96 relAl X-$80 dlacZAM15] for the plasmids. 

LB and 2XTY culture media are used for the bacterio- 55 
logical part (Maniatis et al., 1989). 

Plasmid DNAs are purified according to the alkaline lysis 
technique (Maniatis et al, 1989). 

Definition of the terms employed and abbreviations. 

Recombinant DNA: set of techniques which make it 
possible either to combine, within the same microorganism, 
DNA sequences which are not naturally combined, or to 
mutagenize a DNA fragment specifically. 

ATP: adenosine 5'-triphosphate 65 

BSA: bovine serum albumin 

PBS: 10 mM phosphate buffer, 150 mM NaCl, pH 7.4 



was then ligated to plasmid pXL2648 linearized with EcoRI. 
The recombinant clones were selected after transformation 
into E.coli DH5a and selection for resistance to kanamycin. 
The expected restriction profile was observed on one clone; 
this plasmid clone was designated pXL2649 (FIG. 4). This 
plasmid was introduced by transformation into two E.coli 
strains: 

D1210 [hsdS20, supE44, recA13, ara-14, proA2, lacYl, 
galK2, rpsL20, xyl-5, mtl-1, X~, F-, laclg] (Sadler et 
al., 1980). 

D1210HP, which corresponds to DH1210 lysogenized 
with the phage xis" (Xis" Kil") cI857 (Podjaska et al., 
1985). The D1210HP strain [supE44 ara-14 galK2 
A(gpt-proA)62 rpsL20 xyl5 mtll recA13 A(mcrC-mrr) 
hsdS lacl ? ] (X[cl857 xis" kil"]), accession number 
1-2314, was deposited on Sep. 15, 1999 with the 
Collection National de Cultures de Microorganisms 
(CNCM), Institut Pasteur, 25 rue du Docteur Roux, 
F-75724 Paris Cedex 15, FRANCE. 
The transformants were selected at 30° C. on 2XTY 
medium with kanamycin (50 mg/1). After reisolation on 
selective medium, the strains were inoculated into 5 ml of L 
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medium supplemented with kanamycin (50 mg/1). After 16 A 1-liter culture of the strain D1210HP pXL2650 in 

h of incubation at 30° C. with agitation (5 cm of rotational 2XTY medium supplemented with ampicillin (50 mg/ml) 

amplitude), the cultures were diluted to 1/100 in 100 ml of was set up at 30° C. At an OD 610 equal to 0.3, the culture was 

the same medium. These cultures were incubated under the transferred to 42° C. for 20 min, then replaced for 20 min at 

same conditions until an OD 610 of 0.3 was reached. At this 5 30 0 C. The episomal DNA was prepared by the clear lysate 

point, half of the culture was removed and then incubated for technique (Maniatis et al., 1989), followed by a cesium 

10 min at 42° C. to induce the lytic cycle of the phage, hence chloride density gradient supplemented with ethidium bro- 

the expression of the mtegrase. After this incubation, the mide ( Maniatis et al > 1989 ) ; then by an extraction of the 

cultures were transferred again to 30° C. and then incubated etnidiui]Q bromide with isopropanol and by a dialysis. This 

for 1 h under these conditions. Next cultunng was stopped jQ DNA wag shown tQ CQntain ^ minicircle 100 of this 

and mimpreparations of plasmid DNA were produced. Irre- ion were di ted with PsfI and the hydrolysate was 

spective of the conditions, in the strain D1210, the agarose f r . & . ,. ., A - \ , 

gel electrophoresis profile of the undigested plasmid DNA of ,hen su , b J ec ' ed l ° aCeS T / en f v f adie ^ ^V^- 

plasmid P XL2649 is unchanged, as is also the case in the mented , with f thldn ™ br °™ de - aL ' 19 f } ' , 

strain D1210HP which has not been thermally induced. On ldentlcal ™\ is obtained when the preparation is digested 

the contrary, in D1210HP which has been incubated for 10 15 J Mntl y wlth and XmnI - ^ supercoiled form was 

min at 42° C. and then cultured for 1 hour at 30° C, it is recovered and, after removal of the ethidium bromide 

found that there is no longer a plasmid, but two circular (Maniatis el al.), it was found to correspond only to the 

DNA molecules: one of low molecular weight, migrating minicircle, lacking an origin of replication and any marker 

faster and containing an EcoRI site; and one of higher gene. This minicircle preparation may be used for in vitro 

molecular weight, containing a unique Bgll site, as 20 and in vivo transfection experiments, 

expected. Hence there has indeed been excision of the EXAMPLE 4 

sequences present between the two att sequences, and gen- „ , . ... ,. _ ,, ... _ 

eration of a minicircle bereft of any origin of replicant. In vitro Transfection of Mamma ban Cells, and More Espe- 

This supercoiled circular DNA not carrying an origin of of Human Wlth a Minicircle 

replication is termed a minicircle. This name takes, in effect, 25 The minicircle DNA containing the lucrferase gene of 

better account of the circular nature of the molecule. The ~ Photinus pyralis as descnbed m Example 3, that is to say 

starting plasmid pXL2649 is present, but it represents corresponding to the minicircle generated from plasmid 

approximately 10% of the plasmid which has excised the pXL2650, is diluted in 150 mM NaCl and mixed with a 

sequences flanked by att. transfectant. It is possible to use various commercial 

The minicircle may then be purified by standard tech- transfectants, such as dioctadecylamidoglycylspermine 

niques of plasmid DNA purification, since it is supercoiled (DOGS, Transfectam™, Promega), Lipofectin™ (Gibco- 

like plasmid DNA. These techniques comprise, inter alia, BRL), and the like, in different positive/negative charge 

purification on a cesium chloride density gradient in the ratios. By way of illustration, the transfecting agent was 

presence of ethidium bromide, or alternatively the use of used in charge ratios greater than or equal to 3. The mixture 

anion exchange columns (Maniatis et al., 1989). In addition, 35 is vortexed, left for 10 minutes at room temperature, diluted 

if the plasmid DNA corresponding to the origin of replica- in culture medium without fetal calf serum, and then added 

tion and to the selectable marker is considered to be present to the cells in the proportion of 2,«g of DNA per culture well, 

in an excessively large amount, it is always possible, after The cells used are Caco-2, derived from a human colon 

purification, to use one or more restriction enzymes which adenocarcinoma, cultured according to a protocol described 

will digest the plasmid and not the minicircle, enabling them 4Q (Wils et al., 1994) and inoculated on the day before the 

to be separated by techniques that separate supercoiled DNA experiment into 48-well culture plates in the proportion of 

from linear DNA, such as in a cesium chloride density 50,000 cells/well. After two hours at 37° C, 10% v/v of fetal 

gradient in the presence of ethidium bromide (Maniatis et calf serum is added and the cells are incubated for 24 hours 

al., 1989). at 37° C. in the presence of 5% C0 2 . The cells are washed 

twice with PBS and the luciferase activity is measured 

. . ^, „ r . „ 45 according to the protocol described (such as the Promega 

Obtaining a Minicircle Containing a Cassette for the Expres- ki() j t j s poss ? ble t0 use other lines (fibroblasts, 

sion ol Luciferase. .... lymphocytes, etc.) originating from different species, or 

In order to test the use of these minicircles in vivo, a ^rively cells taken from an individual (fibroblasts, 
reporter gene with the sequences required for its expression ^ lymphocytes, etc.) and which will be rein- 
was cloned into plasmid P XL2649 (see Example 2). This 50 J q ^ u J haion 
was done using, more especially, a 3150-bp Bqlll-BamHI 

cassette originating from pGL2-Control (Promega Biotech). EXAMPLE 5 

This cassette contains the SV40 early promoter, the In vitro Transfection of NIH 3T3 Cells, 

enhancer of the SV40 early promoter, the luciferase gene of The minicircle DNA containing the luciferase gene of 

Photinus pyralis and a polyadenylation site derived from 55 Photinus pyralis, as described in Example 3, that is to say 

SV40. The 3150-bp Bglll-BamHI fragment was cloned at corresponding to the minicircle generated from plasmid 

the BamHI site of pXL2649 digested with BamHI so as to pXL2650, was transfected in vitro into mammalian cells; 

replace the cassette for resistance to kanamycin by the pXL2650 and PGL2-Control (Promega Biotech.), which 

cassette for the expression of luciferase from pGL2-control. contain the same expression cassette, were used as control. 

The plasmid thus constructed was called pXL2650. In this 60 The cells used are NIH 3T3 mouse fibroblasts, inoculated on 

plasmid, the attP and attB sites flank the cassette for the the day before the experiment into 24-well culture plates in 

expression of luciferase. Site-specific recombination enables the proportion of 50,000 cells per well. The plasmid is 

only the sequences required for the expression of luciferase diluted in 150 mM NaCl and mixed with the lipofectant 

together with the luciferase gene to be excised. This recom- RPR115335. However, it is possible to use various other 

bination may be carried out exactly as described in Example 65 commercial agents such as dioctadecylaminoglycylsper- 

2. A minicircle such as plasmid pXL2650 may be used mine (DOGS, Transfectam™, Promega) (Demeneix et al., 

thereafter in in vivo or in vitro transfection experiments. Int. J. Dev. Biol. 35 (1991) 481), Lipofectin™ (Gibco-BRL) 
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(Fegneret al, Proc. Natl. Acad. Sci.USA 84 (1987) 7413), 6-1.2. Insertion of a homopurine-homopyrimidine 

and the like. A positive charge of the lipofectant/negative sequence into plasmid pXL2649 

charge of the DNA ratio equal to or greater than 3 is used. a) Insertion of new restriction sites on each side of the 

The mixture is vortexed, left for ten minutes at room kanamycin cassette of pXL2649. 

temperature, diluted in medium without fetal calf serum, and 5 Plasmid pXL2649, as described in Example 2, was 

then added to the cells in the proportion of 0.5 mg of DNA digested with EcoRI so as to take out the kanamycin cassette 

per culture well. After two hours at 37° C, 10% by volume originating from plasmid pUC4KIXX (PharmaciaBiotech, 

of fetal calf serum is added and the cells are incubated for Uppsala, Sweden). For this purpose, 5 mg of plasmid 

48 hours at 37° C. in the presence of 5% C0 2 . The cells are pXL2649 were digested with EcoRI. The 4.2-kb fragment 

washed twice with PBS and the luciferase activity is mea- to was separated by agarose gel electrophoresis and purified by 

sured according to the protocol described (Promega kit, electroelution. 

Promega Corp. Madison, Wis.), on a Lumat LB9501 lumi- In addition, the plasmid pXL1571 was used. The latter 

nometer (EG and G Berthold, Evry). The transfection results was constructed from the plasmid pFRlO (Gene 25 (1983), 

corresponding to the conditions which have just been stated 71-88), into which the 1.6-kb fragment originating from 

are presented in FIG. 5. They show unambiguously that the is pUC4KIXX, corresponding to the kanamycin gene, was 

minicircle has the same transfection properties as plasmids inserted at the SstI site. This cloning enabled 12 new 

possessing an origin of replication. Thus these minicircles restriction sites to be inserted on each side of the kanamycin 

could be used in the same way as standard plasmids in gene S ene - 

therapy applications. Five micrograms of pXL1571 were dialysed with EcoRI. 

20 The 1.6-kb fragment corresponding to the kanamycin gene 

EXAMPLE 6 was separated by agarose gel electrophoresis and purified by 

Affinity Purification of a Minicircle Using a Triple-helix electroelution. It was then ligated with the 4.2-kb EcoRI 

Interaction. fragment of pXL2649. The recombinant clones were 

This example describes a method of purification of a selected after transformation into£. coli DH5a and selection 

minicircle according to the invention from a mixture con- 25 for resistance to kanamycin and to ampicillin. The expected 

taining the plasmid form which has excised it, by triple-helix restriction profile was observed on one clone; this plasmid 

type interactions which will take place with a synthetic DNA clone was designated pXL2791 . 

sequence carried by the minicircle to be purified. This b) Extraction of the kanamycin cassette from plasmid 

example demonstrates how the technology of purification by pXL2791 

triple-helix formation may be used to separate a minicircle 30 Plasmid pXL2791 was digested with SstI so as to take out 

from a plasmid form which has excised it. the kanamycin cassette. The 4.2-kb fragment was separated 

6-1. Obtaining a Minicircle Containing a Synthetic by agarose gel electrophoresis and purified with the Jetsorb 

Homopurine-homopyrimidine Sequence extraction gel kit (Genomed). It was then ligated. The 

6-1.1. Insertion of a homopurine-homopyrimidine recombinant clones were selected for resistance to ampicil- 

sequence into plasmid pXL2650 35 li n after transformation into E. coli DH5a. The expected 

Plasmid pXL2650 possesses a unique BamHI site imme- " restriction profile was observed on one clone. This plasmid 

diately after the cassette containing the luciferase gene of clone was designated pXL2792. This clone comprises, inter 

Photinus pyralis. This unique site was used to clone the alia, Sail and Xmal restriction sites between the attP and 

following two oligonucleotides: attB sites, c) Cloning of a homopurine-homopyrimidine 

4957 (SEQ ID No. 3) 40 sequence as well as of a cassette permitting the expression 
5'-GATCCGAAGAAG AAGAAGAAG AAGAAG of luciferase between the two attP and attB sites of plasmid 
AAGAAGAAGAAGAAGAAGAAGAAGAAGAAC- pXL2792 

3' Plasmid pXL2727 was used. This plasmid, digested with 

4958 (SEQ ID No. 4) Xmal and Sail, enables a fragment comprising the following 
5'-GATCGTTCTTCTTCTTCTTCTTCTTCTTCT 45 ,0 be tak en out: the pCMV promoter, the luciferase gene of 
TCTTCTTCTTCTTCTTCTTCTTCTTCG-3' Photinus pyralis, a polyadenylation site derived from SV40 

These oligonucleotides, when hybridized and cloned into and a homopurine-homopyrimidine sequence. The latter was 

plasmid pXL2650, introduce a homopurine- obtained after hybridization and cloning of the following 

homopyrimidine sequence (GAA) 17 , as described above. two oligonucleotides: 

To carry out this cloning, the oligonucleotides were first 50 6006: (SEQ ID No. 16) 

hybridized in the following manner. One jug of each of these 5'-GATCTG AAGAAGAAG AAGAAG AAGAAG A 

two oligonucleotides were placed together in 40 ml of a final AG AAGAAG AAGAAG AAGAAGAAGAAG AAC 

buffer comprising 50 mM Tris-HCl, pH 7.4, 10 mM MgCl 2 . TGCAGATCT-3' 

This mixture was heated to 95° C. and was then placed at 6008: (SEQ ID No. 17) 

room temperature so that the temperature would fall slowly. 55 5'-GATCAGATCTGCAGTTCTTCTTCTTCTTCTT 

Ten ng of the mixture of hybridized oligonucleotides were CTTCTTCTTCTTCTTCT 

ligated with 200 ng of plasmid pXL2650 linearized with TCTTCTTCTTCTTCTTCA-3' 

BamHI, 30 ml of final. After ligation, an aliquot was The homopurine-homopyrimidine sequence present in 

transformed into DH5. The transformation mixtures were pXL2727 was sequenced by the Sequenase Version 2.0 

plated out on L medium supplemented with ampicillin (50 60 method (United States Biochemical Corporation). The result 

mg/1). Twenty-four clones were digested with PflMI and obtained shows that the homopurine-homopyrimidine 

BamHI. One clone was found which had the size of the sequence actually present in plasmid pXL2727 contains 10 

950-bp PflMI-BamHI fragment increased by 50 bp. This repeats (GAA-CTT), and not 17 as the sequence of the 

clone was selected and designated pXL2651. oligonucleotides 6006 and 6008 suggested would be the 

Plasmid pXL2651 was purified according to the Wizard 65 case. The sequence actually present in plasmid pXL2727, 

Megaprep kit (Promega Corp., Madison, Wis.) according to read after sequencing on the strand corresponding to the 

the supplier's recommendations. oligonucleotide 6008, is as follows: 
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5'-GATCAGATCTGCAGTCTCTTCTTCTTCTT 
CTTCTTCTTCTTCT TCTTCTCTTCTCA-3' (SEQ 
ID No.18) 

One microgram of pXL2727 was digested with Xmal and 
Sail. The 3.7-kb fragment was separated by agarose gel 
electrophoresis and purified with the Jetsorb extraction gel 
kit (Genomed). In addition, 1.7 mg of pXL2792 were 
digested with Xmal and Sail. The 4.2-kb fragment was 
separated on agarose gel, purified with the Jetsorb extraction 
gel kit (Genomed) and ligated with the 3.7-kb Xmal-Sall i 
fragment of pXL2727. The recombinant clones were 
selected after transformation into£. coli DH5a and selection 
for resistance to ampicillin. The expected restriction profile 
was observed on one clone; this clone was designated 
pXL2793. Plasmid pXL2793 was purified using a caesium l 
chloride density gradient according to a method already 
described (Maniatis et al., 1989). 

6-2. Preparation of the Column Enabling Triple-helix Type 
Interactions with a Homopurine-homopyrimidine Sequence 
Present in the Minicircle to be Effected 2 
The column was prepared in the following manner: 
The column used is a 1-ml HiTrap column activated with 
NHS (N-hydroxysuccinimide, Pharmacia), connected to a 
peristaltic pump (flow rate<l ml/min). The specific oligo- 
nucleotide used possesses an NH 2 group at the 5' end. 2 
For plasmid pXL2651, its sequence is as follows: 
5'-GAGGCTTCTTCTTCTTCTTCTTCTT-3' (SEQ ID 
No. 5) 

For plasmid pXL2793, its sequence is as follows (oligo 
116418): 3 

5'-CTTCTTCTTCTTCTTCTTCTT-3' (SEQ ID No. 19) 

The buffers used are the following: 

Coupling buffer: 0.2 M NaHC0 3 , 0.5 M NaCl, pH 8.3. 

Washing buffer: ^ 

Buffer A: 0.5 M ethanolamine, 0.5 M NaCl, pH 8.3. 

Buffer B: 0.1 M acetate, 0.5 M NaCl, pH 4. 

Fixing and eluting buffer: 

Buffer F: 2 M NaCl, 0.2 M acetate, pH 4.5. 

Buffer E: 1 M Tris-HCl, pH 9, 0.5 mM EDTA. 4 

The column is prepared in the following manner: 

The column is washed with 6 ml of 1 mM HC1, and the 
oligonucleotide diluted in the coupling buffer (50 nmol in 1 
ml) is then applied to the column and left for 30 minutes at 
room temperature. The column is washed with 3 ml of 4 
coupling buffer, then with 6 ml of buffer A, followed by 6 ml 
of buffer B. The latter two buffers are applied three times in 
succession to the column. In this way, the oligonucleotide is 
linked covalently to the column via a CONH link. The 
column is stored at 4° C. in PBS, 0.1% NaN 3 . 5 
6-3. Purification of a Minicircle Containing a Synthetic 
Homopurine-homopyrimidine Sequence, by a Triple-helix 
Type Interaction 

6-3.1. Purification of plasmid pXL2651 

Plasmid pXL2651 was introduced into the strain 5 
D1210HP. This recombinant strain [D1210HP (pXL2651)] 
was cultured as described in Example 3 so as to generate the 
minicircle containing the luciferase gene of Photinus pyra- 
lis. Twenty ml of culture were removed and centrifuged. The 
cell pellet is taken up in 1.5 ml of 50 mM glucose, 25 mM 6 
Tris-HCl, pH 8, 10 mM EDTA. Lysis is carried out with 2 
ml of 0.2 M NaOH, 1% SDS, and neutralization with 1.5 ml 
of 3 M potassium acetate, pH 5. The DNA is then precipi- 
tated with 3 ml of 2-propranol, and the pellet is taken up in 
0.5 ml of 0.2 M sodium acetate, pH 5, 0.1 M NaCl and 6 
loaded onto an oligonucleotide column capable of forming 
triple-helix type interactions with poly(GAA) sequences 
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contained in the minicircle, as described above. After the 
column has been washed beforehand with 6 ml of buffer F, 
the solution containing the minicircle to be purified is 
incubated, after being applied to the column, for two hours 
at room temperature. The column is washed with 10 ml of 
buffer F and elution is then carried out with buffer E. 

Purified DNA corresponding to the minicircle is thereby 
obtained. The minicircle obtained, analysed by agarose gel 
electrophoresis and ethidium bromide staining, takes the 
form of a single band of supercoiled circular DNA. Less 
than 5% of starting plasmid pXL2651 is left in the prepa- 

6-3.2. Purification of plasmid pXL2793 

The 7.9-kb plasmid pXL2793 was introduced into the 
strain D1210HP. This recombinant strain was cultured as 
described in Example 3, so as to generate the 4-kb minicircle 
containing the luciferase gene of Photinus pyralis and a 
3.9-kb plasmid. Two hundred ml of culture were removed 
and centrifuged. The cell pellet was treated with the Wizard 
Megaprep kit (Promega Corp., Madison, Wis.) according to 
the supplier's recommendations. The DNA was taken up in 
a final volume of 2 ml of 1 mM Tris, 1 mM EDTA, pH 8. 
Two hundred and fifty microliters of this plasmid sample 
were diluted with buffer F in a final volume of 2.5 ml. The 
column was washed beforehand with 6 ml of buffer F. The 
whole of the diluted sample was loaded onto an oligonucle- 
otide column capable of forming triple-helix type interac- 
tions with poly(GAA) sequences contained in the minicircle, 
prepared as described above. After washing with 10 ml of 
buffer F, elution is carried out with buffer E. The eluted 
sample is recovered in 1-ml fractions. 

By this method, purified DNA corresponding to the 
minicircle generated from pXL2793 is obtained. The DNA 
sample eluted from the column was analysed by agarose gel 
electrophoresis and ethidium bromide staining, and by 
enzyme restriction. For this purpose, the eluted fractions 
which were shown to contain DNA by assay at OD 260 nm 
were dialysed for 24 hours against 1 mM Tris, 1 mM EDTA, 
then precipitated with isopropanol and taken up in 200 ml of 
H 2 0. Fifteen microliters of the sample thereby obtained 
were digested with Sail, this restriction site being present in 
the minicircle and not in the 3.9-kb plasmid generated by the 
recombination, or with XmnI, this restriction site being 
present in the 3.9-kb plasmid generated by the recombina- 
tion and not in the minicircle. The result obtained is pre- 
sented in FIG. 7, showing that the minicircle has been 
purified of the recombinant plasmid. 

EXAMPLE 7 

In vivo Transfeclion of Mammalian Cells with a Minicircle 
This example describes the transfer of a minicircle coding 
for the luciferase gene into the brain of newborn mice. The 
minicircle (30 f*g) is diluted in sterile 150 mM NaCl to a 
concentration of 1 ftg/fd. A synthetic transfectant such as 
dioctadecylamidoglycylspermine (DOGS) is then added in a 
positive/negative charge ratio less than or equal to 2. The 
mixture is vortexed, and 2 ftg of DNA are injected into the 
cerebral cortex of anaesthetized newborn mice using a 
micromanipulator and a microsyringe. The brains are 
removed 48 hours later, homogenized and centrifuged and 
the supernatant is used for the assay of luciferase by the 
protocols described (such as the Promega kit). 

EXAMPLE 8 

Use of the par Locus of RK2 to Reduce the Presence of 
Minicircle or Miniplasmid Topoisomers 

This example demonstrates the presence of topological 
forms derived i) from the plasmid possessing the attP and 
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attB sequences in the direct orientation, ii) from the perature is 50° C. The PCR product digested at the Xbal and 

minicircle or iii) from the miniplasmid, after the action of Hindlll sites was cloned into the phage M13mpEH between 

the integrase of bacteriophage 1 in E. coli. This example also the Xbal and Hindlll sites. The amplified sequence is 

shows that these topological or oligomeric forms may be identical to the attP sequence described in Lambda II (edited 

resolved by using the par locus of RK2 (Gerlitz et al., 1990 5 by R. W. Hendrix, J. W. Roberts, F. W. Stahl, R. A. Weisberg; 

J. Bacterid. 172 p. 6194). In effect, this locus contains, in Cold Spring Harbor Laboratory 1983) between positions 

particular, the parA gene coding for a resolvase acting at the 27480 and 27863. 

mrs (multimer resolution system) site (Eberl et al., 1994 8-1.3. Plasmid pXL2777 

Mol. Microbiol. 12 p. 131). Plasmid pXL2777 (6.9 kb) possesses the minimal repli- 

8-1. Construction of Plasmids pXL2777 and pXL2960 10 con of ColEl originating from pBluescript, the gene coding 

Plasmids pXL2777 and pXL2960 are derived from the for resistance to kanamycin, the attP and attB sequences of 

vector pXL2776, and possess in common the minimal bacteriophage 1 in the direct orientation and separated by the 

replicon of ColEl, the gene of the transposon Tn5 coding for sacB gene coding for levansucrase of B. subtilis (P. Gay et 

resistance to kanamycin and the attP and attB sequences of al., 1983 J. Bacterid. 153 p. 1424), and the Sp omegon 

bacteriophage 1 in the direct orientation. These plasmids 15 coding for the gene for resistance to spectinomycin Sp and 

differ in respect of the genes inserted between the attP and streptomycin Sm (P. Prentki et al., 1984 Gene 29 p. 303). 

attB sequences, in particular pXL2777 contains the omegon The sacB-Sp cassette having EcoRV and Nsil cloning ends 

cassette (coding for the gene for resistance to comes from the plasmid pXL2757 (FR95/01632) and was 

spectinomycin) whereas plasmid pXL2960 carries par locus cloned between the EcoRV and Nsil sites of pXL2776 to 

of RK2. 20 formpXL2777. 

8-1.1. Minimal vector pXL2658 8-1.4. Plasmid pXL2960 

The vector pXL2658 (2.513 kb) possesses the minimal Plasmid pXL2960 (7.3 kb) possesses the minimal repli- 

replicon of ColEl originating from pBluescript (ori) and the con of ColEl originating from pBluescript, the gene coding 

gene of the transposon Tn5 coding for resistance to kana- for resistance to kanamycin and the attP and attB sequences 

mycin (Km) as selectable marker. After the Bsal end has 25 of bacteriophage 1 in the direct orientation and separated by 

been blunted by the action of the Klenow enzyme, the i) the sacB gene coding for levansucrase of B. subtilis (P. 

1.15-kb Bsal-PvuII fragment of pBKS+ (obtained from Gay et al., 1983 J. Bacteriol. 153 p. 1424) and ii) the par 

Stratagene) was cloned with the 1.2-kb Smal fragment of locus of RK2 (Gerlitz et al., 1990 J. Bacteriol. 172 p. 6194). 

pUC4KIXX (obtained from Pharmacia) to generate the The par cassette having BamHl ends comes from the plas- 

plasmid pXL2647. The oligo-nucleotides 5542 5'(AGCTTC 30 mid pXL2433 (PCT/FR 95/01 178) and was introduced 

TCG AGC TGC AGG ATA TCG AAT TCG GAT CCT CTA between the BamHI sites of pXL2777 to generate pXL2960. 

GAG CGG CCG CGA GCT CC)3' (SEQ ID No.20) and 8-2. Resolution of Minicircle or Miniplasmid Topoisomers 

5543 5'(AGC TGG AGC TCG CGG CCG CTC TAG AGG Plasmids pXL2777 and pXL2960 were introduced by 

ATC CGA ATT CGA TAT CCT GCA GCT CGA GA)3' transformation into E. coli strain D1210HP. The transfor- 

(SEQ ID No.21) were hybridized with one another and then 35 mants were selected and analysed as described in Example 

cloned at the Hindlll site of pXL2647; in this way pXL2658 2, with the following modifications: the expression of the 

is constructed. In this plasmid, the multiple cloning site is integrase was induced at 42° C. for 15 min when the optical 

SstI, NotI, Xbal, BamHI, EcoRI, EcoRV, PstI, Xhol and density of the cells at 610 ran is 1.8, and the cells are then 

Hindlll between the origin of replication and the gene incubated at 30° C. for 30 min, see FIG. 9, or for a period 

coding for resistance to kanamycin. 40 varying from 2 minutes to 14 hours (O/N), see FIG. 10. The 

8-1.2. Vector pXL2776 containing the attP and attB plasmid DNA originating from uninduced and induced cul- 

sequences of phage 1 tures was then analysed on agarose gel before or after 

The vector pXL2776 (2.93 kb) possesses the minimal digestion with a restriction enzyme exclusive to the 

replicon of ColEl originating from pBluescript, the gene minicircle portion (EcoRI) or miniplasmid portion (Bglll), 

coding for resistance to kanamycin and the attP and attB 45 see Figure Y, or after the action of DNA topoisomerase A or 

sequences of bacteriophage 1 in the direct orientation, see the gyrase of E. coli. The supercoiled dimer forms of 

FIG. 8. The 29-bp attB sequence (Mizuuchiet al., 1980 Proc. minicircle or miniplasmid are clearly revealed by i) their 

Natl. Acad. Sci. USA 77 p. 3220) was introduced between molecular weight, ii) their linearization by the restriction 

the SacI and Hindlll restriction sites of pXL2658 after the enzyme, iii) their change in topology through the action of 

sense oligonucleotide 6194 5'(ACT AGT GGC CAT GCA 50 topoisomerase A (relaxed dimer) or of the gyrase 

TCC GCT CAA GTT AGT ATA AAA AAG CAG GCTTCA (supersupercoiled dimer), iv) specific hybridization with an 

G)3' (SEQ ID No.22) has been hybridized with the antisense internal fragment peculiar to the minicircle or the miniplas- 

oligonucleotide 6195 5'(AGC TCT GAA GCC TGC TTT mid. Other topological forms of higher molecular weights 

TTT ATA CTA ACT TGA GCG GAT GCA TGG CCA CTA than that of the initial plasmid originate from the initial 

GTA GCT)3' (SEQ ID No.23) in such a way that the SacI 55 plasmid or the minicircle or the miniplasmid, since they 

and Hindlll sites are no longer re-formed after cloning. This disappear after digestion with the restriction enzyme exclu- 

plasmid, the sequence of which was verified with respect to sive to the minicircle portion (EcoRI) or miniplasmid por- 

attB, is then digested with Spel and Nsil in order to tion (Bglll). These forms are much less abundant with 

introduce in it the attP sequence flanked by the Nsil and pXL2960 than with pXL2777 as initial plasmid, see FIG. 10. 

Xbal restriction sites and thus to generate plasmid pXL2776. 60 In particular, the dimer form of minicircle is present to a not 

The attP sequence was obtained by PCR amplification using insignificant extent with plasmid pXL2777, whereas it is 

plasmid pXL2649 (described in Example 2) as template, the invisible with plasmid pXL2960 when the cells are incu- 

sense oligonucleotide 6190 5'(GCG TCT AGA ACA GTA bated for at least 30 min at 30° C, see FIGS. 9 and 10. It 

TCG TGA TGA CAG AG)3' (SEQ ID No.24) and the should be noted that minicircle dimers are observed at the 

antisense oligonucleotide 6191 5'(GCC AAG CTT AGC 65 beginning of the kinetic experiment with pXL2960 (2 to 10 

TTT GCA CTG GAT TGC GA)3' (SEQ ID No.25), and min), and are thereafter resolved (after 30 min), see FIG. 10. 

performing 30 cycles during which the hybridization tem- Consequently, the par locus leads to a significant reduction 
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in the oligomeric/topological forms resulting from the action 
of the integrase of bacteriophage 1 in E. coli on plasmids 
containing the attP and attB sequences in the direct orien- 
tation. 

5 

IDENTIFICATION OF THE NUCLEOTIDE 
SEQUENCES 

SEQ ID No.l: oligonucleotide 5476: 

5'-AATTGTGAAGCCTGCTTTTTTATACTAA 

CTTGAGCGG-3' 
SEQ ID No. 2: oligonucleotide 5477 

5'-AATTCCGCTCAAGTTAGTATAAAAAAGC 

AGGCTTCAC-3' 
SEQ ID No. 3: oligonucleotide 4957: is 

5'-GATCCGAAGAGAGAGAAGAAGAAGAA 

GAAGAAGAAGAAGAAGAAGAAGAAGAAG 

AAC-3' 

SEQ ID No. 4: oligonucleotide 4958: 
5'-GATCGTTCTTCTTCTTCTTCTTCTTCTTCTT 20 
CTTCTTCTTCTTCTTCTTCTTCTTCG-3' 

SEQ ID No. 5: oligonucleotide poly-CTT: 
5'-GAGGCTTCTTCTTCTTCTTCTTCTT-3' 

SEQ ID No. 6: (attB sequence of phage lambda): 
5 '- CTG CTTTTTTATACTAACTTG-3 1 

SEQ ID No. 7: (attP sequence of phage lambda): 
5'-CAGCTTTTTTATACTAAGTTG-3' 

SEQ ID No. 8: (attB sequence of phage P22): 
5'-CAGCGCATTCGTAATGCGAAG-3' 30 



SEQ ID No. 11: (attP sequence of phage F80): 

5 '- AACACTTTCTTAAATTGTC-3 ' 
SEQ ID No. 12: (attB sequence of phage HP1): 

5'-AAGGGATTTAAAATCCCTC-3' 
SEQ ID No. 13: (attP sequence of phage HP1): 

5'-ATGGTATTTAAAATCCCTC-3' 
SEQ ID No. 14: (att sequence of plasmid pSAM2): 

5'-TTCTCTGTCGGGGTGGCGGGATTTGAAC 

CCACGACCTCTTCGTCCCGAA-3' 
SEQ ID No.15: (Recognition sequence of the resolvase of 

the transposon Tn3): 

5 '-CGTCGAAATATTATAAATTATCAG ACA-3' 
SEQ ID No. 16: oligonucleotide 6006: 

5'-GATCTGAAGAAGAAGAAGAAGAAGAAGA 

AGAAGAAGAAGAAGAAGAAGAAGAAGA 

ACTGCAGATCT-3' 
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oligo 



SEQ ID No. 17: oligonucleotide 6008: 
5'-GATCAGATCTGCAGTTCTTCTTCTTCTTCT 
TCTTCTTCTTCTTCTTCTTCTTCTTCTTCTTC 
TTCA-3' 

SEQ ID No.18: (Sequence present in plasmid pXL2727 
corresponding to the oligonucleotide 6008): 
5'-GATCAGATCTGCAGTCTCTTCTTCTTCTTC 
TTCTTCTTCTTCTTCTTCTCTTCTTCA-3' 
SEQ ID No. 19: (oligonucleotide 116418): 

5'-CTTCTTCTTCTTCTTCTTCTT-3' 
SEQ ID No. 20: (oligonucleotide 5542): 
5'-AGCTTCTCGAGCTGCAGGATATCGAATTC 
GGATCCTCTAGAGCGGCCGCGAGCTCC-3' 
SEQ ID No. 21: (oligonucleotide 5543): 
5'-AGCTGGAGCTCGCGGCCGCTCTAGAGGA 
TCCGAATTCGATATCCTGCAGCTCGAGA-3' 
SEQ ID No. 22: sense oligonucleotide 6194: 
5'-ACTAGTGGCCATGCATCCGCTCAAGTTAG 
TATAAAAAAGCAGGCTTCAG-3' 
SEQ ID No. 23: antisense oligonucleotide 6195: 
5'-AGCTCTGAAGCCTGCTTTTTTATACTAACT 
TGAGCGGATGCATGGCCACTAGTAGCT-3' 
SEQ ID No. 24: sense oligonucleotide 6190: 
5'-GCGTCrAGAACAGTATCGTGATGACAGAG-3' 
SEQ ID No. 25: antisense oligonucleotide 6191: 
5'-GCCAAGCTTAGCTTTGCACTGGATTGCGA-3' 
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SEQUENCE LISTING 



(1) GENERAL I 

(iii) NUMBER OF SEQUENCES: 25 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 



6,143,530 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 
AATTGTGAAG CCTGCTTTTT TATACTAACT TGAGCGG 

(2) INFORMATION FOR SEQ ID NO: 2 : 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
AATTCCGCTC AAGTTAGTAT AAAAAAGCAG GCTTCAC 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 
GATCCGAAGA AGAAGAAGAA GAAGAAGAAG AAGAAGAAGA AGAAGAAGAA GAAGAAC 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

"Oligonucleotide" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
GATCGTTCTT CTTCTTCTTC TTCTTCTTCT TCTTCTTCTT CTTCTTCTTC TTCTTCG 



) SEQUENCE C 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 
GAGGCTTCTT CTTCTTCTTC TTCTT 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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-continued 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 
CTGCTTTTTT ATACTAACTT G 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(ii) MOLECULE TYPE : other nucleic acid 

(A) DESCRIPTION: /desc = "Oligonuc 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

CAGCTTTTTT ATACTAAGTT G 

(2) INFORMATION FOR SEQ ID NO: 8: 



(D) TOPOLOGY: linear 

MOLECULE TYPE : other nv 
(A) DESCRIPTION : /desc 

(xi) SEQUENCE DESCRIPTION: SEQ I 

CAGCGCATTC GTAATGCGAA G 

(2) INFORMATION FOR SEQ ID NO: 9: 
(i) SEQUENCE C 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
CTTATAATTC GTAATGCGAA G 

(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS: 



(xi) SEQUENCE 
AACACTTTCT TAAATGGTT 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE : nucleic acid 
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(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Oligonucleotide" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

AACACTTTCT TAAATTGTC 

(2) INFORMATION FOR SEQ ID NO:12: 



(xi) SEQUENCE DESCRIPTION: SEQ I 
AAGGGATTTA AAATCCCTC 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 



(xi) SEQUENCE DESCRIPTION: SEQ 1 
ATGGTATTTA AAATCCCTC 

(2) INFORMATION FOR SEQ ID NO: 14: 
(i) SEQUENCE 



(A) DESCRIPTION: /desc = "Oligonucleotide" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 
TTCTCTGTCG GGGTGGCGGG ATTTGAACCC ACGACCTCTT CGTCCCGAA 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 



(ii) MOLECULE TYPE: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 
CGTCGAAATA TTATAAATTA TCAGACA 

(2) INFORMATION FOR SEQ ID NO: 16: 
(i) SEQUENCE C 
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(D) TOPOLOGY : linear 



(xi) SEQUENCE D 

GATCTGAAGA AGAAGAAGAA GAAGAAGAAG AAGAAGAAGA AGAAGAAGAA GAAGAACTGC 



(2) INFORMATION FOR SEQ ID NO: 17: 
(i) SEQUENCE CHARACTERISTICS: 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 
GATCAGATCT GCAGTTCTTC TTCTTCTTCT TCTTCTTCTT CTTCTTCTTC TTCTTCTTCT 
TCTTCA 

(2) INFORMATION FOR SEQ ID NO: 18: 



(xi) SEQUENCE 
GATCAGATCT GCAGTCTCTT 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B ) TYPE : nucleic acid 
: double 



(A) DESCRIPTION: /desc = "Oligonucleotide" 
I SEQUENCE D 



(2) INFORMATION F 



(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(D) TOPOLOGY: linear 

(A) DESCRIPTION: /desc = "Oligonucleotide" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
AGCTTCTCGA GCTGCAGGAT ATCGAATTCG GATCCTCTAG AGCGGCCGCG AGCTCC 
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(2) INFORMATION I 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
AGCTGGAGCT CGCGGCCGCT CTAGAGGATC CGAATTCGAT ATCCTGCAGC TCGAGA 

(2) INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 49 base pairs 

(B) TYPE: nucleic acid 
: double 



(A) DESCRIPTION: 
) SEQUENCE D 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO:23: 

CCTGCTTTTT TATACTAACT TGAGCGGATG CATGGCCACT AGTAGCT 

(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 29 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS: double 



) SEQUENCE DESCRIPTION: SEQ ID NO:24: 



) INFORMATION FOR SEQ ID NO:25: 
(i) SEQUENCE CHARACTERISTICS: 



(C) STRANDEDNESS: double 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Oligonucleotide" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

GCCAAGCTTA GCTTTGCACT GGATTGCGA 
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What is claimed is: 

1. A double-stranded DNA molecule, comprising an 
expression cassette containing a gene of interest under 
control of a transcription promoter and a transcription ter- 
minator active in a mammalian cell, wherein said molecule: 5 

is in circular and supercoiled form, 
lacks an origin of replication, 
lacks a marker gene, and 

comprises a region resulting from site-specific recombi- 
nation between two sequences, said region being 10 
located outside the expression cassette. 

2. The molecule according to claim 1, further comprising 
a sequence which interacts specifically with an oligonucle- 
otide to form a triple helix by hybridization. 

3. The molecule according to claim 2, wherein the 15 
sequence which forms a triple helix comprises from 5 to 30 
base pairs. 

4. The molecule according to claim 2, wherein the 
sequence which forms a triple helix is a homopurine- 
homopyrimidine sequence. 

5. The molecule according to claim 1, wherein said region 20 
results from site-specific recombination between two att 
attachment sequences, two recognition sequences of a 
resolvase of a transposon, or two mrs sequences of plasmid 
RK2. 

6. The molecule according to claim 1, further comprising 25 
an mrs sequence originating from a par locus of RK2. 

7. The molecule according to claim 1, wherein the gene of 
interest is a nucleic acid coding for a therapeutic, vaccine, 
agricultural, or veterinary product. 

8. The molecule according to claim 1, wherein said 30 
molecule is obtained by excision from a plasmid or chro- 
mosome by site-specific recombination. 

9. A recombinant DNA comprising a polynucleotide com- 
prising an expression cassette positioned between two 
sequences positioned in direct orientation, which recombine 35 
by site-specific recombination in the presence of a 
recombinase, wherein said expression cassette comprises a 
gene of interest under control of a transcription promoter 
and a transcription terminator active in a mammalian cell, 
and wherein said polynucleotide lacks an origin of replica- 40 
tion and a marker gene. 

10. The recombinant DNA according to claim 9 further 
comprising an origin of replication and, optionally, a marker 
gene, wherein the origin of replication and optional marker 
gene are located outside said polynucleotide. 45 

11. The recombinant DNA according to claim 9, wherein 
the recombinase is a recombinase of an integrase family of 
phage lambda or of a resolvase family of transposon Tn3. 

12. The recombinant DNA according to claim 9, wherein 
the two sequences which recombine by site-specific recom- 50 
bination are derived from a bacteriophage. 

13. The recombinant DNA according to claim 12, wherein 
the two sequences which recombine by site-specific recom- 
bination consist of att attachment sequences of a bacterioph- 
age or sequences derived therefrom. 55 

14. The recombinant DNA according to claim 13, wherein 
the two sequences which recombine by site-specific recom- 
bination consist of attachment sequences of bacteriophage 
lambda, P22, $080, PI, or HP1, of plasmid pSAM2, or of 
sequences derived therefrom. 60 

15. The recombinant DNA according to claim 14, wherein 
the sequences which recombine by site-specific recombina- 
tion comprise all or part of SEQ ID Nos. 1, 2, 6, 7, 8, 9, 10, 
11, 12, 13, or 14. 

16. The recombinant DNA according to claim 12, wherein 65 
the two sequences which recombine by site-specific recom- 
bination are derived from bacteriophage PI. 
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17. A plasmid comprising: 

a) a bacterial origin of replication and optionally, a marker 
gene; and 

b) a polynucleotide comprising an expression cassette 
positioned between attP and attB sequences of a bac- 
teriophage lambda, P22, $80, PI, or HP1, or of plas- 
mid pSAM2, positioned in direct orientation, which 
recombine by site-specific recombination in the pres- 
ence of a recombinase, wherein said expression cas- 
sette comprises a gene of interest under control of a 
transcription promoter and a transcription terminator 
active in a mammalian cell, and wherein said poly- 
nucleotide lacks an origin of replication and a marker 



18. The plasmid according to claim 17, wherein the attP 
and attB sequences which recombine by site-specific recom- 
bination are attachment sequences of bacteriophage lambda. 

19. A plasmid comprising: 

a) a bacterial origin of replication and optionally, a marker 
gene; and 

b) a polynucleotide comprising an expression cassette 
positioned between two inverted repeat sequences of 
bacteriophage PI (loxP region) positioned in direct 
orientation, which recombine by site specific recombi- 
nation in the presence of a recombinase; wherein the 
expression cassette comprises a gene of interest under 
control of a transcription promoter and a transcription 
terminator active in a mammalian cell, and wherein the 
polynucleotide lacks an origin of replication and a 
marker gene. 

20. The recombinant DNA according to claim 9, wherein 
the two sequences which recombine by site -specific recom- 
bination are derived from a transposon. 

21. The recombinant DNA according to claim 20, wherein 
the two sequences which recombine by site-specific recom- 
bination consist of recognition sequences of a resolvase of a 
transposon Tn3, Tn21, or Tn522, or sequences derived 
therefrom. 

22. The recombinant DNA according to claim 21, wherein 
the two sequences which recombine by site-specific recom- 
bination comprise all or part of sequence SEQ ID No. 15. 

23. The recombinant DNA according to claim 9, wherein 
the two sequences which recombine by site-specific recom- 
bination are derived from a par region of plasmid RP4. 

24. The recombinant DNA according to claim 9, wherein 
the expression cassette further comprises a sequence which 
interacts specifically with an oligonucleotide to form a triple 
helix by hybridization. 

25. A plasmid comprising: 

a) an origin of replication and optionally, a marker gene; 
and 

b) a polynucleotide comprising at least one gene of 
interest and a sequence which interacts specifically 
with an oligonucleotide to form a triple helix by 
hybridization, wherein the at least one gene of interest 
and the oligonucleotide interacting sequence are posi- 
tioned between two sequences positioned in direct 
orientation, which recombine by site-specific recombi- 
nation in the presence of a recombinase, and wherein 
the polynucleotide lacks an origin of replication and a 
marker gene. 

26. A plasmid comprising: 

a) an origin of replication and optionally, a marker gene; 
and 

b) a polynucleotide comprising at least one gene of 
interest and an mrs sequence originating from a par 
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locus of plasmid RK2, wherein the at least one gene of 
interest and the mrs sequence are positioned between 
two sequences positioned in direct orientation, which 
recombine by site-specific recombination in the pres- 
ence of a recombinase, and wherein the polynucleotide 
lacks an origin of replication and a marker gene. 

27. The plasmid according to claim 26, wherein the 
polynucleotide further comprises a sequence which interacts 
specifically with an oligonucleotide to form a triple helix by 
hybridization, wherein the oligonucleotide interacting 
sequence is placed between the two sequences positioned in 
direct orientation, which recombine by site -specific recom- 
bination. 

28. A plasmid comprising: 

a) an origin of replication and optionally, a marker gene; 
and 

b) a polynucleotide comprising: 

1) a first set of two sequences positioned in direct 
orientation, which recombine by integrase- 
dependent site -specific recombina ' 

2) a second s 
orientation, which recombine by resolvase 
dependent site -specific recombination; 

3) at least one gene of interest; and, 

4) optionally, a sequence which interacts specifically 
with an oligonucleotide to form a triple helix by 25 
hybridization, 

wherein each integrase-dependent sequence of 1) is posi- 
tioned next to a resolvase-dependent sequence of 2) and 
wherein the at least one gene of 3) and the optional oligo- 
nuclcotide interacting sequence of 4) are placed between the 30 
integrase-dependent/resolvase-depcndeni sequences, and 
wherein the polynucleotide lacks an origin of replication and 
a marker gene. 

29. A cultured recombinant cell comprising one or more 
copies of the recombinant DNA according to claim 9 
inserted into its genome. 

30. A cultured recombinant cell comprising the recombi- 
nant DNA according to claim 10. 

31. The cultured recombinant cell according to claim 30, 
wherein said cell is a bacterium. 

32. The cultured recombinant cell according to claim 30, 40 
wherein said cell is a eukaryotic cell. 

33. The cultured recombinant cell according to claim 31, 
wherein the bacterium is Escherichia coli D1210HP with 
accession number 1-2314. 

34. A method for preparation of the DNA molecule 45 
according to claim 1, comprising culturing 1) a host cell 
comprising a recombinant DNA comprising a nucleic acid 
consisting of an expression cassette positioned between two 
sequences positioned in direct orientation, which recombine 
by site-specific recombination in the presence of a 50 
recombinase, and wherein the expression cassette comprises 

a gene of interest under control of a transcription promoter 
and a transcription terminator active in a mammalian cell 
with 2) a recombinase, whereby site -specific recombination 
occurs between the two sequences positioned in direct 55 
orientation. 



35. The method according to claim 34, wherein said 
expression cassette is positioned between two bacteriophage 
sequences, which are positioned in direct orientation and 
recombine by site-specific recombination. 
5 36. The method according to claim 34, wherein the 
cultured host cell is brought into contact with the recombi- 
nase by transfecting or infecting the cultured host cell with 
a plasmid or a phage containing a gene for the recombinase. 
jo 37. The method according to claim 34, wherein the 
cultured host cell is brought into contact with the recombi- 
nase by inducing expression of a gene coding for the 
recombinase, wherein the gene is present in the host cell. 

38. The method according to claim 37, wherein the host 
15 cell comprises within its genome a recombinase gene having 

temperature-regulated expression, and wherein the cultured 
host cell is brought into contact with the recombinase by 
culturing the host cell at an induction temperature of the 
recombinase gene, whereby expression of the recombinase 
sequences positioned in direct 20 g ene j s induced. 

39. The method according to claim 38, wherein the host 
cell comprises a lysogenic phage integrated in its genome 



40. A method for preparation of the DNA molecule 
according to claim 1, comprising combining: 

a) a replicative plasmid comprising: 

1) an origin of replication and optionally, a marker 
gene; and 

2) a polynucleotide comprising an expression cassette 
positioned between two sequences positioned in 
direct orientation, which recombine by site-specific 
recombination in the presence of a recombinase, 
wherein the expression cassette comprises a gene of 
interest under control of a transcription promoter and 
a transcription terminator active in a mammalian 
cell, and wherein the polynucleotide lacks an origin 
of replication and a marker gene; and 

b) a recombinase, whereby site-specific recombination 
occurs between the two sequences of 2) positioned in 
direct orientation. 

41. The method according to claim 34, further comprising 
purifying a minicircle formed by said site-specific recom- 
bination. 

42. The method according to claim 41, wherein the 
minicircle is purified by contacting the minicircle with a 
specific oligonucleotide that is grafted onto a support, 
whereby a triple helix is formed by hybridization of said 
specific oligonucleotide with a specific sequence present in 
the minicircle. 

43. The recombinant DNA according to claim 9, wherein 
the two sequences which recombine by site-specific recom- 
bination are from a 2/1 plasmid. 
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Alterations in the Directionality of X Site-specific 
Recombination Catalyzed by Mutant Integrases in Vivo 

Nicole Christ and Peter Droge* 

Institute of Genetics, University Phage X integrative and excisive recombination normally proceeds by a 
of Cologne, Weyertal 121 pair of sequential strand exchanges. During the first exchange reaction, 

D-50931, Cologne, Germany the "top" strand in each recombination site is cleaved, exchanged, and 
religated generating a Holliday junction intermediate. This intermediate 
DNA structure is resolved through a pair of reciprocal "bottom" strand 
exchanges, leading to recombinant products. The strict co-ordination of 
exchange reactions ensures religation between correct partner strands 
only. Here we show that the directionality of recombination is altered 
in vivo by two mutant integrases, Int-h (E174 K) and a double mutant 
Int-h/218 (E174 K/E218 K). This change in directionality leads to del- 
etion instead of inversion on substrates that carry inverted attachment 
sites and, depending on the pair of target sites employed, requires the 
presence or absence of integration host factor. Neither Fis nor Xis is 
involved in deletion. Sequence analyses of deletion products reveal that 
the newly generated hybrid attachment site exhibits a reversed genetic 
polarity. We demonstrate that only one of two possible hybrid site con- 
figurations is generated and discuss two pathways leading to deletion. In 
the first, deletion results from a wrong alignment of the two recombina- 
tion sites within the synaptic complex. In the second pathway, the unco- 
ordinated cleavage by the mutant integrases of all four DNA strands pre- 
sent in a conventional Holliday junction intermediate leads to two 
double-stranded breaks, whereby the subsequent rejoining between 
"wrong" partner strands appears restricted to only two strands. ., 
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Introduction 



The phage X-encoded integrase protein (Int) is 
the prototype of the so-called integrase family 
which catalyzes conservative site-specific recombi- 
nation between two DNA target sites. Int executes 
both the integration and excision of the phage into 
and out of the Escherichia coli genome, respectively. 
The structure of the catalytic domain of Int has 
recently been solved (Kwon et al, 1997). The Int 
system represents, therefore, one of the best under- 
stood recombination systems (for reviews, see 
Landy, 1989, 1993; Sadowski, 1993; Nash, 1996; 
Hallet & Sherratt, 1997; Yang & Mizuuchi, 1997). 



Abbreviations used: Int, phage-encoded integrase; 
IHF, integration host factor; Fis, factor for inversion 
stimulation; Xis, phage-encoded excisionase. 

E-mail address of the corresponding author 
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Integrative and excisive recombination occurs 
between pairs of attachment sites, termed attP/attB 
and attL/atfR, respectively. Each att sequence is 
composed of two core Int binding sites separated 
by a seven base-pair overlap region. The overlap 
sequence is identical in all wild-type att sites, and 
identity is a prerequisite for efficient recombina- 
tion. In addition to core sites where strand clea- 
vage and religation occurs, each site except attB 
contains additional Int binding sites, so-called arm 
sites. A varying number of flanking recognition 
sequences ior the aceessory**SMA-bending proteins 
integration host factor (IHF), factor for inversion 
stimulation (Fis), or the phage-encoded Xis protein 
are also present in the flanking regions, again with 
the exception of attB. Int is a heterbbivalent DNA- 
binding protein and, with assistance from the 
accessory proteins, is able to bind simultaneously 
to core and arm sites within the same att site (Moi- 
toso de Vargas et ah, 1988, 1989). Depending on 
both the presence and number of accessory factors, 
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the resulting nucleoprotein structures at affP, attL, 
and aftR exhibit different architectures, and it is 
this difference that controls for the directionality of 
the reaction, i.e. integration or excision (Moitoso de 
Vargas & Landy, 1991). 

In the first step leading to integrative recombina- 
tion, a specialized nucleoprotein structure, the inta- 
some, is formed between Int, IHF, and supercoiled 
affP (Better et al, 1982; Richet et al, 1986). The 
second step involves the pairing of the intasome 
with protein-free flffB, the latter consists only of 
two core sites and the overlap region. Hence, Int 
monomers which catalytically act upon flffB are in 
this case exclusively provided from AffP (Richet 
et al, 1988; Patsey & Bruist, 1995). There is no 
DNA topological constraint imposed on synapsis 
between affP and atfB, which explains why both 
direct and inverted pairs of att sites present on the 
same DNA molecule are efficiently recombined 
in vitro and in vivo, leading to deletion or inversion 
of the intervening DNA segment, respectively. 

In the third step, Int catalyzes a reciprocal 
exchange of the "top" strands at the left boundary 
of the overlap region, which results in a Holliday 
junction recombination intermediate (Figure 1(a) 
and (b)). This intermediate DNA structure is 

(a) (b) 

<>■ f 

3 5' 3 5* 

" '■. .If ' ■ ■ 

<d) (c) 

3' 5 - 3 - 5- 

Figure 1. Conventional strand exchange during X inte- 
grative recombination, (a) flffB (generic polarity BOBO 
and affP (POP) are aligned in antiparallel orientation. 
The Int monomer (filled oval) bound to either the B arm 
(marked B) or P arm (marked P) initiates a nucleophilic 
attack (filled arrows) against the top strand within each 
att site, (b) Reciprocal strand exchanges between top 
strands has been completed, leading to a Holliday junc- 
tion intermediate structure, (c) The Int monomers bound 
to the B' and V arm have cleaved the bottom strands at 
the right ends of both overlap regions and are thus 
covalently linked to the DNA. (d) Reciprocal bottom 
strand exchange has been completed leading to offL 
(generic polarity BOF) and at(R (POBO, which are the 
natural targets for exdsive recombination- 



resolved by exchange of the "bottom" strands at 
the right boundary of the overlap region 
(Figure 1(b) to (d)). Thus, Int executes an ordered, 
sequential pair of strand exchanges, i.e. cleavage, . 
exchange, and rejoining of one pair of recombina- 
tion partner strands is completed before initiating 
the same reactions on the other pair of strands 
(Nunes-Duby et al, 1987; Kitts & Nash, 1988). 
During these reactions, Int becomes covalently 
attached m cis to the broken DNA strand through 
a 3'-phosphotyrosine linkage, which is sub- 
sequently resolved when the 5'-hydroxyl group of 
the invading strand attacks the linkage and dis- 
places the recombinase (Figure 1(c) and (d)); 
Burgin & Nash, 1992; Nunes-Duby et al, 1994). 
Neither supercoiling of flffP nor the presence of 
IHF seems to be required for catalysis of these 
chemical reactions. Integrative recombination even- 
tually leads to the formation of flffL and flffR, 
which are the targets for excision (Figure 1(d)). 

Excisive recombination is genetically the exact 
reversal of integration, but employs different 
nucleoprotein structures on affL/affR. In addition 
to IHF, which again serves mainly as an architec- 
tural protein at these sites (Goodman et al, 1992), 
the X phage-encoded Xis protein is required for the 
formation of a recombinogenic nucleoprotein com- 
plex at flffR. Thus, in contrast to integrative recom- 
bination, two separate nucleoprotein structures are 
formed before synapsis occurs by random collision 
(Kim & Landy, 1992). It was also shown that Int, in 
the absence of any accessory proteins, can align 
affP and flffL within a bi-molecular complex, and 
can even recombine pairs of aftL (Segall & Nash, 
1993, 1996). However, the order of strand exchange 
during excisive recombination is the same as 
observed for integrative recombination. 

According to a recently proposed model, the 
switch from top to bottom strand exchange 
involves isomerization of the Holliday junction 
with concomitant restacking of base-pairs within 
the overlap region (Nunes-Duby et al., 1995; Azaro 
& Landy, 1997). How Int controls this step and, 
thus, ensures the order of strand exchange during 
integrative and excisive recombination is not 
known. There is evidence that specific protein-pro- 
tein interactions are required between at least three 
Int protomers bound to core sites within a Holli- 
day junction (Franz & Landy, 1990; Kho & Landy, 
1994). In addition, Int mutants have been isolated 
which indicate that specific interactions between 
Int monomers are important in the co-ordination of 
strand cleavage events within this intermediate 
structure <Han et al., 'Conformational 
changes of both the catalytic domain and the mol- 
ecular interface within and between Int monomers, 
respectively, are probably involved in co-ordinat- 
ing the sequence of strand exchanges. This can be 
inferred from the structure of the Cre-toxP Holliday 
junction intermediate (Gopaul et al, 1998). The 
exact roles for IHF and Xis in this co-ordination is 
also unclear. There is evidence that both proteins 
control the efficiency of resolution of Holliday 
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junctions in the direction of recombinant products 
(Franz & Landy, 1995). 

Here, we show that two mutant Int variants, 
E174 K and E174 K/E218 K, alter the directionality 
of recombination reactions in vivo, leading to del- 
etion instead of inversion on substrates that carry 
two ait sites as inverted repeats. The efficiency of 
this reaction depends 7on the type of alt sites 
employed and the presence or absence of EUR 
Nearly 100% deletion occurs, for example, with 
substrates bearing inverted atth and affP sites in 
the presence of IHF. However, neither Xis nor Fis 
is involved in deletion. Based on DNA sequence 
information obtained from various deletion 
products, we discuss two possible mechanisms 
leading to deletion. 



Results 

Integrase mutants 

Here, we analyze the in vivo catalytic activities of 
two Int mutants. The first one, termed Int-h, was 
originally identified by Miller et al. (1980) in a 
screen for X mutants that overcome the block 
imposed on recombination by the himA42 
mutation of E. coli. The mutation replaces a gluta- 
mate residue with a lysine residue at codon 174 
and maps within the N-terminal region of the cata- 
lytic domain (Figure 2). Subsequent purification 
and characterization of Int-h revealed that it pro- 
motes integrative and excisive recombination in 
the absence of accessory proteins and supercoiling, 
albeit with significantly reduced efficiencies 
(Lange-Gustafson & Nash, 1984). It is proposed 
that this mutation results in an enhanced affinity 
for core sites, which would account for the 
increased frequency of in vivo and in vitro inte- 
gration into secondary att sites that deviate from 
the wild-type attB sequence (Miller et al, 1980; 
Patsey & Bruist, 1995). 




Figure 2. Genetic map of X Int. The map highlights 
the three important functional domains of Int involved 
in DNA arm-binding, core-binding, and catalysis 
(Tirumalai et al., 1997). Numbers refer to amino acid 
residues. Open circles indicate the position of Alal25 
and Alal26, which make close contact with bases in the 
cqre binding site (Tirumalai et al., 1998). Filled circles 
mark the positions of three strictly conserved residues 
within the Int family of recombinases, Le. Arg212, 
His308, and Arg311 (reviewed by Nunes-Duby et al., 
1998). The asterisk marks the active site residue, tyrosine 
342. Crosses demarcate the positions of four residues 
that seem involved in determining recombination speci- 
ficity with respect to target sites (Yagil et al., 1995; 
Dorgai et al, 1995). Int-h and Int-h/218 mark the corre- 
sponding amino acid changes within two mutant Int 
variants analyzed in the present study (see the text). 



Our initial goal was to further enhance the abil- 
ity of Int-h to perform integration into secondary 
att sites in the absence of supercoiling. We thus 
introduced, by PCR-directed mutagenesis, a second 
mutation into the catalytic domain of Int-h, which 
replaces a glutamate residue at codon 218 with a 
lysine residue. Our choice for mutating this specific 
residue is based on a recent finding (Wu et al, 
1997) that lysine at this position improves binding 
of wild-type Int to core sites, presumably through 
non-specific contacts) with the DNA backbone. 
We will subsequently refer to the double mutant as 
Int-h/218 (Figure 2). 

In vivo catalytic activities of Int-h and Int-h/218 

We first tested whether Int-h/218 retains the 
ability of Int-h to promote integrative recombina- 
tion in the absence of IHF in vivo. Expression vec- 
tors for Int, Int-h, and Int-h/218 were co- 
transformed with pNCl, which carries attB and 
afrP as inverted repeat (Table 1), into either E. coli 
strain CSH26 or an isogenic variant, CSH 26AIHF. 
In the latter strain, both genes encoding for sub- 
units of IHF have been destroyed by transposition. 
Resulting single colonies were cultivated over- 
night, plasmid DNA isolated, subjected to restric- 
tion digests, and analyzed by agarose gel 
electrophoresis. 

Int and the two variants perform efficient inte- 
grative recombination in the presence of IHF, lead- 
ing to 100% inversion without induction of gene 
expression by IFTG (Figure 3, lanes 4 to 7). Hence, 
leaky Int expression from the p, rt . promoter in the 
presence of lac repressor bound to its operator 
sequences is sufficient to promote complete inver- 
sion over the time-course of the experiment. How- 
ever, in the absence of IHF, Int is completely 
inactive and inversion catalyzed by Int-h is barely 
detectable (lanes 8 and 9). Int-h/218 executes 
inversion with comparable low efficiency, but we 
were surprised to detect a more prominent product 
band migrating above that of the expression vector 
(demarcated del.; Figure 3, lane 10). The yield of 
this new product increases over time (lane 11), but 
varies considerably between experiments (data not 
shown). The same product could also be detected 
with Int-h in some but not all experiments 
(Table 1). 

Undigested plasmid DNA derived from a 
sample obtained with Int-h/218 was resolved 
through electrophoresis, the new product DNA 
isolated and re-transformed into E. coli. Restriction 
analysis of plasmid- DNA-T*©m~single colonies — 
revealed that the products contain a deletion 
between atfB and attP (data not shown). DNA 
sequencing confirmed that notion. We found two 
different products. The first, termed aff Al, results 
from recombination that joins the left core site of 
atfB, the so-called B arm, to the left core site of 
attP, the P arm (see Figure 6(a)). In the following, 
we refer to this type of new att site with reversed 
genetic polarity, i.e. BOP instead of BOI", as a 
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Table 1. DNA substrates and catalytic activities of mutant and wild-type integrase 





Inl-h and Int-h/218 


wild-type Int 




inversion 


deletion 


inversion 


deletion 


J substrate 


+IHF j -IHF 


41HF f -IHF 


+IHF | -IHF 


+IHF | -IHF 



PNC, Jgf- 
pNC4 



pNC7 JS^_ 
pNCS -=|=> — 

pNCW 
pNCH 
pNC12 



pNCS ^ 



n DNA substrates employed in this study are indicated. Arrows demarcate the attachment (att) sites, the open rectangle marks the 
position of the kanamycin resistance gene, and the Hlled rectangle represents the pACYC origin of replication, atll" refers to the pre- 
sence of a nucleotide change within the overlap region (see the text). The arm sites within each att and relevant positions of deavage 
sites for restriction enzymes are also indicated. 

* The (+) sign refers to the presence of the corresponding DNA band after ethidium staining. 

b Int-h is sigruficantly less active than Int-h 218. 

c Inversion observed after 48 hours ^ 

'* Complete loss of substrate vector. 

.« Barely detectable with Int-h/218 and Int-h after 24 and 48 hours, respectively. 

' Not determined. The expected recombination products, i.e. inversion for pNCl through pNC8 and deletion for pNC9 through 
pNC12, are boxed. 



hybrid site. Our analysis also revealed that in the 
particular orientation chosen to depict att Al (BOP) 
with respect to atiB/attP (see Figure 6(a)), the top 
strand of the overlap region within aft Al is 
derived from the bottom strand of attV. However, 
the top strand of the overlap in the second pro- 
duct, att A2, is derived from the top strand of attB 
(see Figure 6(a)). From a total of 12 analyzed 
sequences, each derived from a single colony after 
re-transformation, we found that eight contain the 
overlap provided from affP while four carry the 
overlap region from attB. 

Int-h is known to recombine pairs of att sites 
which differ in their overlap sequence by one or 
more base-pairs more efficiently than wild-type Int 
(Kitts & Nash, 1987; Patsey & Bruist, 1995). This 
feature is, at least in part, ascribed to the enhanced 
affinity of Int-h to core sites. In order to test 
whether deletion can be detected and, if so, how 



Int-h/218 performs on such a substrate, we con- 
structed pNC6, which contains attB and a variant 
form of affP (affP*) as inverted repeats (Table 1). 
Within affP* a guanine base replaces the third 
nucleotide, a mymine residue, of the overlap 
sequence. pNC6 was co-transformed with Int 
expression vectors, and plasmid DNA was sub- 
jected to restriction analysis. While in the absence 
of IHF, all three Int variants are completely inac- 
tive in integrative recombination (data- not shown; 
Table 1), both Int-h and, with enhanced efficiency, 
Int-h/218 execute inversion in the presence of IHF 
(Figure 4, lanes 2 and 3). However, both mutants 
in addition generate a second prominent product 
migrating at the top of the gel. The new product 
was isolated and amplified .as before, and its 
sequence revealed that the B arm of attB is joined 
to the P arm of atiP, as observed before with 
pNCl. In this case, however, we found that the 
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Figure 3. In vivo catalytic activities of wild-type and mutant Int on pNCl. pNCl, which carries offB and affP as 
inverted repeats, was co-transformed into E. coli with expression vectors for either wild-type or mutant Int (see 
Materials and Methods). Isolated plasmid DNA was incubated with Aval and the resulting restriction fragments ana- 
lyzed through agarose gel electrophoresis. We show a gel after ethidium bromide staining. Lane 1, kb marker ladder; 
lane 2, expression vector alone; lane 3, unrecombined pNCl (note that the third restriction fragment is not shown); 
lanes 4 to 6, DNA isolated from cells expressing Int, Int-h, or Int-h/218, respectively, in the presence of IHF; lane 7, 
same as lane 6, but DNA was obtained from a different colony; lanes 8 to 10, DNA isolated from CSH26AIHF cells 
expressing Int, Int-h, or Int-h/218, respectively; lane 11, same as lane 10, but DNA isolated after an additional 
24 hours incubation. pNCl, unrecombined substrate DNA; expr. vec., Int expression vector; inv., one of two product 
bands that results from inversion (the second co-migrates with the expression vector); del., deletion product which is 
cleaved only once due to the absence of a second restriction site. The asterisk marks the position of a third product 
band which is present in some experiments and which results from homologous recombination as determined by 
restriction analysis. _:■ 1 



overlap originates in 11 out of 12 analyzed 
sequences from attB (data not shown). 

Intrigued by the relatively high yield of deletion 
products obtained with pNC6, we investigated 
whether other pairs of inverted att sites can lead to 
deletion. We therefore constructed two substrates 
for excisive recombination, pNC2 and pNC7, 
which contain affL and affR as inverted repeats 
(Table 1). The affR site in pNC7, termed attR*, car- 
ries the same mutated overlap sequence as present 
in cffP*. We found that in the absence of IHF (and 
Xis), Int-h and Int-h/218 deleted the segment 
between att sites in pNC2 with the same efficiency 
as observed with pNCl. Deletion on pNC7 was 
barely detectable with Int-h/218 in the presence of 
IHF, and could only be detected after 48 hours of 
in vivo incubation with Int-h. However, it is note- 
worthy that we were unable to detect inversion on 
pNC7 either in the presence or absence of IHF 
(Table 1). This may indicate that deletion requires 
initial steps of the conventional strand exchange 
pathway (see Discussion). 

The wild-type att sites tested so far are converted 
by conventional strand exchange into the expected 
recombination products (Table 1; see Figure 6(a)). 
It is impossible to determine from these exper- 
iments whether one or both pairs of sites, i.e. 
(attB/attF) or/and (attL/attR), are the substrates 
for deletion. We therefore constructed a substrate 
that cannot be altered by conventional strand 
exchange, pNC4, which carries affP and affL as 
inverted repeats (Table 1). If recombination occurs 
between these sites, this will lead to inversion. 



Their genetic polarities do not change, however 
(see Figure 6(b)). 

We found that in the presence of IHF, -nearly 
100 % of pNC4 is converted into a deletion pro- 
duct upon reaction with either Int-h or Int-h/218 
(Figure 5, lanes 5 and 6). Wild-type Int executes 
only inversion (lane 4). In the absence of IHF, 
Int and the two variants exclusively catalyze 
inversion (lanes 7 to' 9). Samples of undigested 
deletion products obtained from Int-h/218 and 
Int-h were isolated, re-transformed, and DNA 
from ten and nine colonies, respectively, sub- 
jected to DNA sequencing. In each case, we 
found two hybrid off sites. The first, termed 
att A3, was present three or two times, respect- 
ively, and contains the B arm joined to the P 
arm (Figure 6(b)). In the orientation . chosen for 
affA3 in Figure 6(b) (BOP), both arm sites are 
separated by the overlap provided from the 
bottom strand of affP. In the second product, 
att A4, which was found seven times in each 
case, the overlap originates from -the top strand 
of affL. From this analysis, we conclude that 
inverted affL and affP cawrstfrve-as substrate for - 
a highly efficient change in the directionality of 
recombination by mutant Int in the presence of 
IHF, leading to almost 100% deletion instead of 
inversion. 

We further tested whether other E. coli proteins, 
in addition to IHF, may play a role in deletion. At 
the present stage using various E. coli strains (see 
Materials and Methods), we found that recA, recB, 
and recC are not required for deletion with pNC4 
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Figure 4. In vivo catalytic activities of wild-type and 
mutant Int on pNC6. pNC6, which carries wild-type 
ottB and a mutated attP (aHP*) as inverted repeats was 
co-transformed with Int expression vectors as before, 
and plasmid DNA isolated and subjected to gel electro- 
phoresis after restriction digest with Aval (Table 1). 
Lanes 1 to 3, DNA isolated from cells expressing Int, 
Int-h, or Int-h/218, respectively, in the presence of IHF; 
lane 4, unrecombined pNC6; lane 5, expression vector 
alone; lane 6, kb marker ladder, inv. demarcates the 
position of the two product bands that result from 
inversion, del. points to the position of the linearized 
deletion product. 



and pNC6. In adition, deletion occurs in the 
absence of Fis on a derivative of pNC6 (data not 
shown). 

Only one of two possible new hybrid aff sites 
is generated 

So far we have analyzed the products that result 
from deletion on substrates that contain inverted 
aft -sites, leading to the identification of a new 
hybrid site with reversed genetic polarity, i.e. BOP 
instead of BOP'. In order to address the question 
whether mutant Int can also generate the corre- 
sponding second hybrid site composed of B' (or F") 
arm, overlap, and P arm (B'QP), we first con- 
structed a series of substrates that contain different 
pairs of ait sites as direct repeats (pNC9 to pNC12; 
Table 1). If the mutant Int proteins execute a com- 
plete set of alternative strand exchange reactions 
which involves all four DNA strands present 
within a synaptic complex, this should result in 
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Figure 5. In vivo catalytic activities of wild-type and 
mutant Int on pNC4. pNC4, which canies inverted attL 
and flffP, was co-transformed with Int expression vec- 
tors and isolated plasmid DNA processed as described 
before, except that BamHI was used as endonuclease 
(compare to Table 1). Lanes 1 and 10, kb marker ladder; 
lane 2, expression vector alone; lane 3, unrecombined 
pNC4; lanes 4 to 6, DNA isolated from cells expressing 
Int, Int-h, or Int-h/218, respectively, in the presence of 
IHF; lanes 7 to 9, same as lanes 4 to 6, but in the 
absence of IHF. pNC4 demarcates the position of the 
three DNA fragments that result from digestion of 
unrecombined pNC4 (note that the two smaller frag- 
ments exhibit the same length and, thus, co-migrate); 
inv. indicates the position of two of three fragments that 
result from inversion (the third fragment exhibits the 
same size as the largest fragment obtained from un- 
recombined DNA); del. marks the position of one of 
two DNA fragments that result from deletion (the size 
of the second fragment does not change as a result of 
deletion). 



inversion instead of deletion. However, we found 
that deletion occurs on all four substrates, but 
inversion between directly repeated aff sites was 
not detectable, in either the presence or absence of 
IHF. These experiments also revealed that Int-h/ 
218 is significantly more active than Int-h in cata- 
lyzing deletion on pNC9 and pNCIO in the 
absence of IHF (Table 1). Hence, Int-h/218 exhibits 
an enhanced ability to exeecrfif recombination on 
wild-type aff sites in the absence of accessory fac- 
tors IHF and Xis. 

Our failure to detect the second hybrid site 
could so far be due to the possibility that a func- 
tional synaptic complex cannot be formed because 
an unknown topological constraint is imposed on 
synapsis with substrates carrying aff sites as direct 
repeats. In order to exclude this possibility, we 
again employed the pair of affL and affP which 
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Figure 6. (a) DNA sequences of deletion products 
obtained with pNCl. The sequence of the core and over- 
lap regions from o«B and affP are shown at the left side. 
Both ail sites are aligned in parallel, as indicated by 
black arrows. The capital letters B, B', P, and F mark 
the corresponding core site sequences. The two overlap 
sequences are boxed and marked as O. The open arrow 
heads point to the positions of top and bottom strand 
cleavage by Int. Depicted at the top right are the 
sequences and genetic polarities of the two products, 
atlL and atfR, that result from conventional integrative 
recombination. Shown at the bottom right are the DNA 
sequences and genetic polarities of the two hybrid ait 
sites, termed attAl and <jfrA2, which are present on del- 
etion products. Note that in attAl die overlap sequence 
is provided from the P arm, while that present in aff A2 
comes from the B arm. (b) DNA sequences of deletion 
products obtained with pNC4. The sequences of at(L 
and affP are depicted at the left side. Both sites are 
aligned again in parallel. Symbols are as described in 
(a). The two products of conventional strand exchange 
are shown at the top right Note that the composition of 
sites with respect to both the core sites and their flank- 
ing regions does not change during conventional strand 
exchange. Depicted at the bottom right are the 
sequences of the two hybrid ait sites, off A3 and attA4, 
which are identical with attAl and «HA2, respectively. 



gives the highest yield of deletion products with 
pNC4- In this case, however, we placed the 
inverted off sites in a different orientation with 
respect to both the plasmid origin and the resist- 
ance marker gene (pNC5; Table 1). If deletion 
occurs, a second hybrid ail site (FOF) should be 



generated that can be propagated by plasmid repli- 
cation. The first identified site, BOP, will in this 
case be lost because the deleted DNA segment 
does not contain a replication origin. While we 
were able to detect inversion in the absence of IHF, 
products that result from deletion in the presence 
of IHF are missing. Instead, we observed that cells 
completely degrade pNC5 when either Int-h or Int- 
h/218 is present (Table 1). To test whether plasmid 
degradation is due to the instability of the expected 
recombination product carrying FOF, we con- 
structed a plasmid designated pFOF which con- 
tains the equivalent sequence of one of the two 
expected recombination products (see Materials 
and Methods). We found, however, that pFOP' is 
stably propagated in the presence of Int-h (data 
not shown). Hence, the instability of pNC5 cannot 
be traced back to the instability of the expected 
recombination product carrying the second hybrid 
site. These results in conjunction with our failure to 
detect inversion on substrates pNC9 to pNC12 
might indicate that both mutant Int proteins effi- 
ciently execute strand cleavage without subsequent 
ligation to generate the second hybrid ail site. 



Discussion 

Integrative and excisive recombination per- 
formed by Int normally lead to inversion of DNA 
segments when the corresponding pair of target 
sites is present as an inverted repeat on the same 
DNA molecule. We have demonstrated in this 
study that mutant Int, in addition, executes an 
alternative reaction which leads to deletion. The 
most efficient reaction, resulting in nearly 100% 
deletion in the presence of IHF, occurs on pNC4 
which carries «ffL and flttP (Figure 5). 

One possible model to account for the results is 
that the two mutant Int proteins have lost the abil- 
ity to distinguish between the core binding sites 
present to the left and to the right of the overlap 
region. This would allow synapsis between two ail 
sites in the wrong orientation. Reciprocal top 
strand exchange would then lead to mispaired top 
strands because of non-complementary ' bottom 
strands present in the resulting Holliday junction. 
Despite these heterologies, catalytic events may 
proceed normally due to the presence of mutant 
Int proteins with an enhanced affinity for core 
sites, and the Holliday junction may eventually be 
resolved through reciprocal bottom strand 
exchanges. The resulting heteroduplex structures at 
the overlap region of both^nybrict aff sites could " 
then be resolved through repair and/or plasmid 
replication. 

While this presumably represents the most 
simple scenario leading to deletion, we think it is 
unlikely for two reasons. First, the model predicts 
that two hybrid alt sites should be generated due 
to reciprocal top and bottom strand exchanges. 
This should lead to inversion on substrates pNC9 
to pNC12, and to deletion on pNC4 and pNC5. 
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Our results show, however, that inversion is not 
detectable, that deletion occurs only on pNC4 and 
not on pNC5, and that pNC5 is lost despite selec- 
tion for the antibiotic resistance on this plasmid 
(Table 1). Second, the model predicts that pre-exist- 
ing heterologies between overlap sequences of a 
pair of alt sites should not be important because 
the strands will eventually be mispaired in a Holi- 
day junction after the first strand exchange is com- 
pleted. Hence, deletion on pNCl and pNC6, the 
latter carries attP*, should occur under the same 
conditions, i.e. in the presence or absence of THF, 
and presumably with comparable efficiencies. The 
results show, however, that deletion on pNCl 
occurs only in the absence of IHF while deletion 
on pNC6 is observed only in the presence of IHF 
and, in addition, occurs with an enhanced effi- 
ciency (Figures 3 and 4). Control experiments show 
that recombination between attP* and a variant of 
flffB carrying the same nucleotide exchange within 
the overlap region proceeds normally in the pre- 
sence of IHF (data not shown). It is unlikely, there- 
fore, that the different IHF requirements for 
deletion on pNCl and pNC6 are due to a sequence 
effect imposed by the overlap in attP*. 

In the following, we propose a second possible 
pathway leading to deletion. We will focus our dis- 
cussion primarily on deletion observed with pNC4 
because this particular substrate has the advantage 
that the composition of attL and affP does not 
change during the course of conventional strand 
exchange (Figure 6(b)). It is reasonable to assume 
that in the first step, nucleoprotein structures sep- 
arately assemble on affL and attP in the presence of 
IHF (Segall & Nash, 1993). After site synapsis in 
the correct orientation, the first strand exchange is 
completed (Figure 7(a) and (b)). This results in a 
Holhday junction, which can be resolved normally 
by reciprocal "bottom" strand exchanges (compare 
to Figure 1). However, we think that during a sub- 
sequent isomerization step, which is required to 
switch from top to bottom strand exchange 
(Nunes-Diiby et al, 1995; Azaro & Landy, 1997), 
Int-h and Int-h/218 accidentally cleave all four 
strands either sequentially or simultaneously. 
This will lead to two double-stranded breaks 
(Figure 7(c)). Based on DNA sequence information 
obtained from deletion products, we conclude that 
the 5'-OH group from the overlap strand extending 
the P arm (labelled p) engages in a nucleophilic 
attack on the 3'-phosphotyrosine linkage between 
Int-h and the B arm, hence replacing the recombi- 
nase. Likewise, the overlap strand still linked to 
the B arm (labelled b) attacks the Int-DNA linkage 
at the P arm (Figure 7(c) and (d)). Whether ligation 
of both strands occurs on the same DNA molecule 
in vivo is uncertain. It is possible that on individual 
substrate molecules, only one of these strands is 
ligated. If so, a subsequent repair involving gap 
filling has to occur on the second strand by E. coli 
proteins. However, this scenario would lead to the 
potential problem that one Int-h monomer remains 
covalently linked in cis to the corresponding core 



(a) 



(b) 



..f V V v 

3 5- 3 5- 



n 



~ irk, 

Figure 7. Schematic representation of one possible 
reaction pathway leading to deletion on pNC4. (a) attL 
and attP are depicted in antiparallel orientation within a 
hypothetical synaptic complex. The Int monomer (filled 
oval) bound to either the B or P arm initiates a nucleo- 
philic attack on the top strand within each att site (indi- 
cated by curved arrows), (b) The first reciprocal pair of 
strand exchanges between top strands has been com- 
pleted, leading to a Holliday junction intermediate struc- 
ture, (c) All four Int-h (Int-h/218) monomers engage in 
a nucleophilic attack against their corresponding core 
sites and become covalently connected in cis through a 
3'-phosphotyrosine linkage. This will lead to a double- 
stranded break within each att site, (c), (d) The 5'-OH 
from the overlap strand connected to the B arm in attL 
(labelled b) is rejoined with the P arm from attP. Like- 
wise, the 5'-OH from the overlap strand which is still 
connected with the P arm in offP (labelled p) is ligated 
with the B arm in attL. This leads to the formation of a 
new hybrid att site with correct chemical polarity, but 
with reversed genetic polarity (i.e. BOP). Note mat if 
both overlap strands, p and b, are religated within the 
same DNA molecule, the seven base-pair overlap region 
will adopt a heteroduplex structure containing five out 
of seven non-complementary bases. Based on the results 
presented in this study, the corresponding second 
hybrid att site (POP) cannot be formed through religa- 
tion. It is inferred that these DNAs which still carry Int 
monomers covalently linked at the 3' ends will be 
degraded by E. coli proteins. 



site and has" to be removed pnor* toligation. At the 
moment, we favor the possibility that both strands 
can be ligated on a single substrate molecule. This 
would lead to the formation of a seven base-pair 
heteroduplex structure in the overlap region of the 
new hybrid att site (Figure 7(d)), which could be 
resolved through repair and/or plasmid replica- 
tion. 

We have shown that the corresponding second 
hybrid att site cannot be generated by the mutant 
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Int. Based on our observation that the substrate 
DNA is lost with pNC5 in the presence of IHF, we 
think it is likely that E. coli enzymes degrade the 
linearized, origin-bearing DNA that contains Int-h 
monomers covalently linked to the 3' ends 
(Figure 7(d)). This implies that double-stranded 
cleavage is efficient and eventually occurs on the 
entire population of pNC5 molecules inside E. coli. 
This, in turn, is in agreement with our observation 
of nearly 100% deletion on pNC4. Deletion on 
pNC2, on the other hand, is much less efficient. 
This can explain why pNC3 is stably propagated 
(Table 1). The fact that we were unable to identify 
the second hybrid att site suggests furthermore 
that strand ligation occurs within the framework of 
a synaptic complex, and not by random collision of 
freed DNA ends. 

The role(s) of the accessory protein IHF in chan- 
ging the directionality of strand exchange is puz- 
zling. IHF is strictly required for deletion on pNC4 
(Figure 5 and Table 1). One possibility is that 
additional Int-h (Int-h/218) monomers are deliv- 
ered to the core sites of atth from its flanking arm 
binding sites via IHF-induced DNA-bending (Moi- 
toso de Vargas et al, 1988, 1989). Perhaps the pre- 
sence of additional mutant Int monomers with an 
enhanced affinity for core sites interferes with the 
co-ordination of strand cleavage during isomeriza- 
tion of Holliday junctions. In contrast, deletion on 
pNCl is only observed in the absence of MF 
(Figure 3). IHF and Xis seem to direct Holliday 
junction resolution towards recombinant products 
in integrative and excisive recombination (Franz & 
Landy, 1995). It is possible that without these 
accessory proteins, the isomerization step required 
to switch to bottom strand exchange is impaired, 
so that the amount of this intermediate structure 
transiently increases. This, in turn, could result in a 
"cleavable synaptic complex" (Figure 7(c)) due to 
the presence of mutant instead of wild-type Int. 
A similar reasoning could account for deletion 
observed with substrates carrying inverted pairs of 
wild-type attL and att R (pNC2; Table 1). The situ- 
ation is different again with pNC6 and pNC7, 
which require IHF for deletion (Figure 4 and 
Table 1). Since the overlap sequence in these pairs 
of att sites differs at position 0, it is likely that IHF 
is required to overcome an impairment in the iso- 
merization step imposed by such a heterology. 
This again could lead to a transient increase in the 
amount of Holliday junctions, which are either 
resolved through reciprocal bottom strand 
exchange or cleaved and rejoined by mutant Int to 
yield deletion products. 

The formation of so-called "contrary" recombi- 
nant products by wild-type Int has been observed 
before in in vitro studies using either heteroduplex 
or half-eft site substrates (Nash & Robertson, 1989; 
Nunes-Duby et al., 1989, 1997). These products, 
termed Y-structures, contain one recombinant 
strand that results from conventional top strand 
exchange, while the second strand shows normal 
chemical polarity but with reversed genetic 



polarity (e.g. BOP). Complete new hybrid att sites 
with reversed genetic polarity in both strands, as 
shown in the present study, were not observed. 
These Y-structures most likely result as a direct 
consequence of the aberrant structures of att sites. 
However, an important finding of these studies is 
that Int can join strands mdiscrirriinately in the 
absence of complementary (homologous) strands. 

The present study shows that true contrary 
recombinant products are generated in vivo with 
wild-type substrates for integrative and excisive 
recombination in the absence of IHF. It is import- 
ant to note that these products are observed only 
with mutant Int. It is therefore an intrinsic property 
of Int-h and Int-h/218, and not that of a particular 
recombination substrate, that leads to the observed 
change in directionality of recombination. It is 
possible that the presence of a lysine residue 
instead of a glutamate residue at position 174 
within the catalytic domain of Int somehow inter- 
feres with the normal communication between Int 
monomers within a Holliday junction. This could 
lead to an unco-ordinated nudeophilic attack on 
all four strands. Alternatively, or in addition, the 
presence of this lysine residue may interfere with 
the normal isomerization step of Holliday junc- 
tions, leading to a conformational change within 
this intermediate DNA structure that allows four 
Int monomers to attack the DNA backbones. An 
inspection of the recently solved structures of the 
Cre-Jox recombination synapse and Holliday junc- 
tion may be informative here (Gopaul et al, 1998; 
Guo et al., 1997). In the recombination synapse, ala- 
nine 131 and lysine 132 from the Cre subunit that 
has cleaved the loxA site are in contact with DNA. 
A sequence comparison of 105 members of the Int 
family shows that lysine 174 in Int-h aligns with 
alanine 131 in Cre (Nunes-Duby et al, 1998). This 
is consistent with our hypothesis that Int-h (Int-h/ 
218) interferes with the isomerization of Holliday 
junctions required to switch from top to bottom 
strand exchange, possibly through additional DNA 
contacts) within this intermediate structure. 



Materials and Methods 

Bacteria! strains 

The following E. coli strains were employed in this 
study. DH5a {swpEM AlacU169 (<t>80hcZAMl5) hsdR17 
recAl endAl gyrA96 tfa'-J reM2?«ariahaivl983); DH1 (F~ 
supEU recAl endAl gyrA96 tiii-1 hsdR17 relAl X) 
(Hanahan, 1983); JC5547 (thr-1 ara-U leuB6 A(gpl- 
proA)62 lacYl tsx-33 glnV44(AS) galKXOc) X' hisGXOc) 
recA13 recC22 recBU rpsL31(strR) xylA5 mtl-1 argE3(Oc) 
AM; Willetts & Clark, 1969); CSH26 (F~ araA(lacpro) thi; 
Miller, 1972); CSH26AIHF (F~ araAOac pro) thi 
hbnAA82::TnlO(Tc K ) himDA3:xat(Cm K ); kindly provided 
by B. Rak, Freiburg, Germany); CSH50 (F~ araA(lac pro) 
thi strA; Miller, 1972); and CSH50AFis (F~ araA(Iac pro) 
thi strA fiK:kan; Koch et al., 1988). 
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Construction of integrase expression vectors 

Int and Int-h were subdoned by PCR from pHNl and 
pHN16 (Honigman et dL, 1979; Lange-Gustafson & 
Nash, 1984), respectively. Both genes were introduced 
into the polylinker from pTrc99A (Pharmacia) in which 
the Ncol site has been destroyed. Expression from 
pTrc99A is under the control of the strong trc promoter 
containing the trp (-35) and the lac UV5 (-10) region 
separated by 17 bp. Expression is regulated by the 
lacF gene product also encoded on the same 
plasmid. Both genes were amplified using the following 
primers: "IntproNl" which binds at the 5' end 
(5M3CTCTAGAATGGGAAGAAGGCGAAGTCA-3') and 
"IntproCO" binding at the 3' end (5'-AAAACTGCAGT 
CATTATITGATrrCAATTTTGTCCC-3'). PCR products 
were resolved through agarose gel electrophoresis, iso- 
lated, and digested with Xbal and Psfl. The fragments 
were cloned into pTrc99A which was linearized by 
Xbal/Pstl, and the resulting expression vectors (pTrdnt 
and pTrdnt-h) amplified in DH5oc 

The double mutant Int-h/218 was generated from 
pTrcInt-h by PCR-directed mutagenesis, which substi- 
tuted the guanine at position 652 of the Int-h gene by an 
adenine residue. The resulting codon change at position 
218 replaces the glutamate residue with a lysine residue. 
The following oligonucleotides were used as primers: 
ncla218KR containing the altered nucleotide ann eals a t 
the coding strand of the Int-h gene (S'-GGTGATTTA 
TGCAAAATGAAGTGGTCTGATATCGTAGATGGA-30 
and nclb218KL anneals at the non-coding strand 
and directs DNA synthesis in the opposite direction 
(5'-AACTCGTTGCCCGGTAACAACAGCCAGTTCCAT 
TGCAAG-3')- PCR was performed using the "Master 
Mix Kir" (Qiagen, Germany). The resulting linear 
expression vector was purified through agarose gel elec- 
trophoresis, isolated, and ligated after phosphorylation 
using T4 Ligase (New England Biolabs). Screening for a 
functional double mutant was performed through in vivo 
recombination assays (see below). The sequence of the 
sub-cloned Int, Int-h, and Int-h/218 gene was confirmed 
by DNA sequencing. 



Construction of recombination substrates 

Plasmids used as recombination substrates are listed 
in Table 1 and are derivatives of pACYC177 carrying the 
kanamycin resistance gene. pNCl (9.13 kb) bearing wild- 
type flftP and at(B as inverted repeats was obtained by 
combining pAB3YC, a derivative of pAB3 (Droge & 
Cozzarelli, 1989), with pACYC177. pAB3YC was par- 
tially digested with Nhel in order to remove a 1.4 kb ori- 
gin-containing fragment, and ligated with pACYC177 
which was linearized at its unique Nhel site. pNC2 
(9.13 kb) was generated by in vivo recombination of 
pNCl using wild-type Int in the presence of IHF (see 
below). pNC3 (9.13 kb) is a derivative of pNC2 in which 
the orientation of flffL and «HR with respect to both the 
origin and resistance gene has been changed. pNC4 (8.2 
kb) was constructed by inserting a Xwrnl-EcoRV artP-con- 
taining fragment from pNCl into Sad-linearized 
pACYC177. atth was obtained from pNC2 and sub- 
sequently inserted by blunt-end ligation into the EagI site 
of the flffP-bearing plasmid. pNC5 (859 kb) was derived 
from pNC2 by cloning attP, which is present on a Nhel 
fragment, into a different position of afiR-deleted pNC2. 
pNC6 (7.04 kb) was derived from pNCl by replacing 
at(P with affP*. The modified affP site was generated by 
PCR-directed mutagenesis and first cloned into pTZ18R 



(Pharmacia). pNC7 (6.67 kb) is a derivative of pNC2 in 
which /iffR has been replaced by cffR*, the latter was 
obtained from in vivo recombination between a deriva- 
tive of ottB, termed tf/fH, and cfrP*. pNCS (5.9 kb) is a 
derivative of pNCl, in which affP is replaced by attR*. 
pNC9 (9.13 kb) is a derivative of pNCl, in which the 
orientation of flttP has been inverted. pNCIO (9.13 kb) is 
a derivative of pNC2 in which flifR has been inverted 
with respect to flifL, so that both att sites are present as 
direct repeats. pNCll (820 kb) was constructed as 
described for pNC4, but in this case selecting for the pre- 
sence of att sites as direct repeats. pNC12 (7.04 kb) is a 
derivative of pNCl in which aHP was replaced by affP* 
and screened for the desired orientation. pFOF was 
generated by cloning two PCR-derived fragments into 
BamHI/Hmdm-cleaved pACYC184. For this, we used 
the unique Ddel restriction site present in aUP. The 
sequence of the Int core binding sites and the overlap in 
FOP is as follows: 5'-CAACTTAGTAT AAA- 
TAAGTTGGC-3'. It therefore represents one of two poss- 
ible recombination products that were expected if 
deletion occurs on pNC5. 



In vivo recombination 

Expression vectors and recombination substrates were 
co-transformed into the appropriate E. coli strains men- 
tioned in the text. After incubation of single colonies 
overnight in the presence of ampicillin to select for the 
expression vector and kanamycin to select for the sub- 
strate DNA, transformants were cultivated at 37 °C for 
an additional 17 hours under selection pressure, and 
plasmid DNA isolated by affinity chromatography (Qia- 
gen, Germany). Recombination was analyzed through 
restriction digests using the appropriate endonudease 
(Table 1) and subsequent separation of DNA fragments 
through agarose gel electrophoresis. In order to test 
whether recA, recB or recC may be required for deletion 
on pNC4 and pNC6, we employed E. coli strains DH1 
and JC5547. The requirement for Fis was tested by com- 
paring deletion on a derivative of pNC6, which carries a 
spectomycin resistance gene, in CSH50 and CSHSOAFis. 



DNA sequencing 

Deletion products were sequenced using the fluor- 
escence-based 373A system (Applied Biosystems). The 
following two oligonucleotides were used as sequencing- 
primers: ATT-PC which anneals at the P arm in 
the direction of the overlap region (5'-TTGATAGCT 
CrTCCGCTTTCTGTTACAGGTCACTAATACC-3'), and 
ATT-BA which anneals at the complementary strand 
within the B arm (5'-GTCTAGCTAGCCGGGAAACTG 
AAAATGTGTTC-3'). The sequence of subdoned Int 
genes and that of Int-h/218 was determined using three 
oligonudeotides as primers. Two of them anneal within 
pTrc99A either upstream qrjjpj^Sgam of the poly- 
linker. The" third anneals at "'nucleotide positions 331 to 
348 within the Int gene. 



Gel electrophoresis 

DNA was analyzed through agarose gel electrophor- 
esis (0.8%) in TBE buffer (90 mM Tris-borate (pH 8.3), 
25 mM EDTA). DNA was visualized by UV after stain- 
ing with ethidium bromide. Photographs were taken 
with the Image Master® System (Pharmada). 
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ABSTRACT 



Positive-negative selector (PNS) vectors are provided for 
modifying a target DNA sequence contained in the genome 
of a target cell capable of homologous recombination. The 
vector comprises a first DNA sequence which contains at 
least one sequence portion which is substantially homolo- 
gous to a portion of a first region of a target DNA sequence. 
The vector also includes a second DNA sequence containing 
at least one sequence portion which is substantially homolo- 
gous to another portion of a second region of a target DNA 
sequence. A third DNA sequence is positioned between the 
first and second DNA sequences and encodes a positive 
selection marker which when expressed is functional in the 
target cell in which the vector is used. A fourth DNA 
sequence encoding a negative selection marker, also func- 
tional in the target cell, is positioned 5' to the first or 3' to the 
second DNA sequence and is substantially incapable of 
homologous recombination with the target DNA sequence. 
The invention also includes transformed cells containing at 
least one predetermined modification of a target DNA 
sequence contained in the genome of the cell. In addition, 
the invention includes organisms such as non-human trans- 
genic animals and plants which contain cells having prede- 
termined modifications of a target DNA sequence in the 
e of the o: 
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POSITIVE-NEGATIVE SELECTION formation of plant cells with a plant expression vector 

METHODS AND VECTORS containing tomato polygalacturonase (PG) oriented in the 
opposite orientation for expression. The anti-sense RNA 

This invention was funded under grant No. R01-GM- expressed from this gene is capable of hybridizing with 

21168 issued by the U.S. Department of Health and Human 5 endogenous PG mRNA to suppress translation. This inhibits 

Services. This is a continuation of application Ser. No production of PG and as a consequence the hydrolysis of 

07/397,707, filed Aug. 22, 1989, now abandoned. ^ ctin bv PG in ±e tomata 

While the integration of heterologous DNA into cells and 

TECHNICAL FIELD OF THE INVENTION organisms is potentially useful to produce transformed cells 

10 and organisms which are capable of expressing desired 

The invention relates to cells and non-human organisms genes and/or polypeptides, many problems are associated 

containing predetermined genomic modifications of the with sucn systems. A major problem resides in the random 

genetic material contained in such cells and organisms. The paHtm of integration of the heterologous gene into the 

invention also relates to methods and vectors for making genome of cells derived from multicellular organisms such 

such modifications. 15 38 mammalian cells. This often results in a wide variation in 

the level of expression of such heterologous genes among 

RArKfiROTTNTi ciP THF iNVPNTifW different transformed cells. Further, random integration of 

BACKGROUND OF THE INVENTION heterologous DNA into the genome may disrupt endogenous 

Many unicellular and multicellular organisms have been g enes which are necessary for the maturation, differentiation 

made containing genetic material which is not otherwise and/or viability of the cells or organism. In the case of 

normally found in the cell or organism. For example, bac- transgenic animals, gross abnormalities are often caused by 

teria, such as E. coli, have been transformed with plasmids random integration of the transgene and gross rearrange- 

which encode heterologous polypeptides, i.e., polypeptides ments of the transgene and/or endogenous DNA often occur 

not normally associated with that bacterium. Such trans- at the insertion site. For example, a common problem 

formed cells are routinely used to express the heterologous „ associated with transgenes designed for tissue-specific 

gene to obtain the heterologous polypeptide. Yeasts, fila- expression involves the "leakage" of expression of the 

mentous fungi and animal cells have also been transformed transgenes. Thus, transgenes designed for the expression and 

with genes encoding heterologous polypeptides. In the case secretion of a heterologous polypeptide in mammary secre- 

of bacteria, heterologous genes are readily maintained by tor y cells may also be expressed in brain tissue thereby 

way of an extra chromosomal element such as a plasmid. 3Q producing adverse effects in the transgenic animal. While 

More complex cells and organisms such as filamentous the reasons for transgene "leakage" and gross rearrange- 

fungi, yeast and mammalian cells typically maintain the ments of heterologous and endogenous DNA are not known 

heterologous DNA by way of integration of the foreign DNA with certainty, random integration is a potential cause of 

into the genome of the cell or organism. In the case of expression leakage. 

mammalian cells and most multicellular organisms such 3J One approach to overcome problems associated with 

integration is most frequently random within the genome. random integration involves the use gene of targeting. This 

Transgenic animals containing heterologous genes have method involves the selection for homologous recombina- 

also been made. For example, U.S. Pat. No. 4,736,866 tion events between DNA sequences residing in the genome 

discloses transgenic non-human mammals containing acti- °f a ce^ 01 organism and newly introduced DNA sequences, 

vated oncogenes. Other reports for producing transgenic 40 This provides means for systematically altering the genome 

animals include PCT Publication No. W082/04443 (rabbit ° f the cell or organism. 

p-globin gene DNA fragment injected into the pronucleus of For example, Hinnen, J. B., et al. (1978) Proc. Natl. Acad. 

a mouse zygote); EPO Publication No. 0 264 166 (Hepatitis ScL U.S.A., 75, 1929-1933 report homologous recombina- 

B surface antigen and tPA genes under control of the whey tion between a leu2 + plasmid and a leu2~ gene in the yeast 

acid protein promotor for mammary tissue specific expres- 45 genome. Successful homologous transformants were posi- 

sion); EPO Publication No. 0 247 494 (transgenic mice lively selected by growth on media deficient in leucine, 

containing heterologous genes encoding various forms of p or mammalian systems, several laboratories have 

insulin); PCT Publication No. W088/00239 (tissue specific reported the insertion of exogenous DNA sequences into 

expression of a transgene encoding factor DC under control specific sites within the mammalian genome by way of 

of a whey protein promotor); PCT Publication No. W088/ 50 homologous recombination. For example, Smithies, O., et 

01648 (transgenic mammal having mammary secretory cells a!. (i 98 5) Nature> 230-234 report the insertion of a 

incorporating a recombinant expression system comprising linearized plasmid into the genome of cultured mammalian 

a mammary lactogen-inducible regulatory region and a cells near the p-globin gene by homologous recombination, 

structural region encoding a heterologous protein); and EPO The modified locus so obtained contained inserted vector 

Publication No. 0 279 582 (tissue specific expression of 55 sequences containing a neomycin resistance gene and a sup 

chloramphenicol acetyltrans-ferase under control of rat F gene encoding an amber suppressor t-RNA positioned 

p-casein promotor in transgenic mice). The methods and between the 8 and p-globin structural genes. The homolo- 

DNA constructs ("transgenes") used in making these trans- g0U s insertion of this vector also resulted in the duplication 

genie animals also result in the random integration of all or 0 f some 0 f me p NA sequence between the 8 and p-globin 

part of the transgene into the genome of the organism. 60 genes and part of the P-globin gene itself . Successful trans- 

Typically, such integration occurs in an early embryonic formants were selected using a neomycin related antibiotic, 

stage of development which results in a mosaic transgenic since most transformation events randomly inserted this 

animal. Subsequent generations can be obtained, however, plasmid, insertion of this plasmid by homologous recombi- 

wherein the randomly inserted transgene is contained in all nat j on d id not con f er a selectable, cellular phenotype for 

of me somatic cells of the transgenic animals. 65 homologous recombination mediated transformation. A 

Transgenic plants have also been produced. For example, laborious screening test for identifying predicted targeting 

U.S. Pat. No. 4,801,540 to Hiatt, et al., discloses the trans- events using plasmid rescue of the supF marker in a phage 



library prepared from pools of transfected colonies was 
used. Sib selection utilizing this assay identified the trans- 
formed cells in which homologous recombination had 



A significant problem encountered in detecting and iso- 5 
lating cells, such as mammalian and plant cells, wherein 
homologous recombination events have occurred lies in the 
greater propensity for such cells to mediate non-homologous 
recombination. See Roth, D. B., et al. (1985) Proc. Natl. 
Acad. Sci. U.S.A., 82 3355-3359; Roth, D. B., et al. (1985), i 0 
Mol. Cell. Biol, 5, 2599-2607; and Paszkowski, J., et al. 
(1988), EMBO I, 7, 4021-4026. In order to identify 
homologous recombination events among the vast pool of 
random insertions generated by non-homologous recombi- 
nation, early gene targeting experiments in mammalian cells 
were designed using cell lines carrying a mutated form of 
either a neomycin resistance (neo') gene or a herpes simplex 
virus thymidine kinase (HSV-tk) gene, integrated randomly 
into the host genome. Such exogenous defective genes were 
then specifically repaired by homologous recombination 
with newly introduced exogenous DNA carrying the same 20 
gene bearing a different mutation. Productive gene targeting 
events were identified by selection for cells with the wild 
type phenotype, either by resistance to the drug G418 (neoO 
or ability to grow in HAT medium (tk + ). See, e.g., Folger, K. 
R., et al. (1 984), Cold Spring Harbor Symp. Quant. Biol, 49, 25 
123-138; Lin, F. L. et al. (1984), Cold Spring Harbor Symp. 
Quant. Biol, 49, 139-149; Smithies, O., et al. (1984), Cold 
Spring Harbor Symp. Quant. Biol., 49, 161-170; Smith, A. 
J. H., et al. (1984), Cold Spring Harbor Symp. Quant. Biol., 
49, 171-181; Thomas K. R, et al. (1986), Cell, 41, 419-428; 30 
Thomas, K. R., et al. (1986), Nature, 324, 34-38; Doet- 
schman, T, et al. (1987), Nature, 330, 576-578; and Song, 
Kuy-Young, et al. (1987), Proc. Natl. Acad. Sci. U.S.A., 84, 
6820-6824. A similar approach has been used in plant cells 
where partially deleted neomycin resistance genes report- 35 
edly were randomly inserted into the genome of tobacco 
plants. Transformation with vectors containing the deleted 
sequences conferred resistance to neomycin in those plant 
cells wherein homologous recombination occurred. Pasz- 
kowski, J., et al. (1988), EMBO J., 7, 4021-4026. 40 

A specific requirement and significant limitation to this 
approach is the necessity that the targeted gene confer a 
positive selection characteristic in those cells wherein 
homologous recombination has occurred. In each of the 45 
above cases, a defective exogenous positive selection 
marker was inserted into the genome. Such a requirement 
severely limits the utility of such systems to the detection of 
homologous recombination events involving inserted select- 
able genes. 5Q 

In a related approach, Thomas, K. R., et al. (1987), Cell, 
51, 503-512, report the disruption of a selectable endog- 
enous mouse gene by homologous recombination. In this 
approach, a vector was constructed containing a neomycin 
resistance gene inserted into sequences encoding an exon of 55 
the mouse hypoxanthine phosphoribosyl transferase (Hprt) 
gene. This endogenous gene was selected for two reasons. 
First, the Hprt gene lies on the X-chromosome. Since 
embryonic stem cells (ES cells) derived from male embryos 
are hemizygous for Hprt, only a single copy of the Hprt gene 60 
need be inactivated by homologous recombination to pro- 
duce a selectable phenotype. Second, selection procedures 
are available for isolating Hprt" mutants. Cells wherein 
homologous recombination events occurred could thereafter 
be positively selected by detecting cells resistant to neomy- 65 
cin (neo*) and 6-thioguanine (HprT). 

A major limitation in the above methods has been the 



requirement that the target sequence in the genome, either 
endogenous or exogenous, confer a selection characteristic 
to the cells in which homologous recombination has 
occurred (i.e. neo* tk + or Hprt"). Further, for those gene 
sequences which confer a selectable phenotype upon 
homologous recombination (e.g. the Hprt gene), the forma- 
tion of such a selectable phenotype requires the disruption of 
the endogenous gene. 

The foregoing approaches to gene targeting are clearly not 
applicable to many emerging technologies. See, e.g. Fried- 
man, T. (1989), Science, 244, 1275-1281 (human gene 
therapy); Gasser, C. S., et al., Id., 1293-1299 (genetic 
engineering of plants); Pursel, I. G., et al., Id.. 1281-1288 
(genetic engineering of livestock); and Timberlake, W. E., et 
al., Id. et al., 13—13, 1312 (genetic engineering of filamen- 
tous fungi). Such techniques are generally not useful to 
isolate transformants wherein non-selectable endogenous 
genes are disrupted or modified by homologous recombina- 
tion. The above methods are also of little or no use for gene 
therapy because of the difficulty in selecting cells wherein 
the genetic defect has been corrected by way of homologous 
recombination. 

Recently, several laboratories have reported the expres- 
sion of an expression-defective exogenous selection marker 
after homologous integration into the genome of mammalian 
cells. Sedivy, J. M., et al. (1989), Proc. Nat. Acad. Sci. 
U.S.A., 86, 227-231, report targeted disruption of the hem- 
izygous polyomavirus middle-T antigen with a neomycin 
resistance gene lacking an initiation codon. Successful trans- 
formants were selected for resistance to G418. Jasin, M., et 
al. (1988), Genes and Development, 2, 1353-1363 report 
integration of an expression-defective gpt gene lacking the 
enhancer in its SV40 early promoter into the SV40 early 
region of a gene already integrated into the mammalian 
genome. Upon homologous recombination, the defective gpt 
gene acts as a selectable marker. 

Assays for detecting homologous recombination have 
also recently been reported by several laboratories. Kim, H. 
S., et al. (1988), Nucl. Acid. S. Res., 16, 8887-8903, report 
the use of the polymerase chain reaction (PCR) to identify 
the disruption of the mouse hprt gene. A similar strategy has 
been used by others to identify the disruption of the Hox 1 . 1 
gene in mouse ES cells (Zimmmer, A. P., et al. (1989), 
Nature, 338, 150-153) and the disruption of the En-2 gene 
by homologous recombination in embryonic stem cells. 
(Joyner, A. L., et al. (1989), Nature, 338, 153-156). 

object herein to provide methods whereby any 
" region of the genome of a cell or organism 
may be modified and wherein such modified cells can be 
selected and enriched. 

It is a further object of the invention to provide novel 
vectors used in practicing the above methods of the inven- 
tion. 

Still further, an object of the invention is to provide 
transformed cells which have been modified by the methods 
and vectors of the invention to contain desired mutations in 
specific regions of the genome of the cell. 

Further, it is an object herein to provide non-human 
transgenic organisms, which contain cells having predeter- 
mined genomic modifications. 

The references discussed above are provided solely for 
their disclosure prior to the filing date of the present appli- 
cation. Nothing herein is to be construed as an admission 
that the inventors are not entitled to antedate such disclosure 
by virtue of prior invention. 



SUMMARY OF THE INVENTION 

In accordance with the above objects, positive-negative 
selector (PNS) vectors are provided for modifying a target 
DNA sequence contained in the genome of a target cell 
capable of homologous recombination. The vector com- 
prises a first DNA sequence which contains at least one 
sequence portion which is substantially homologous to a 
portion of a first region of a target DNA sequence. The 
vector also includes a second DNA sequence containing at 
least one sequence portion which is substantially homolo- ' 
gous to another portion of a second region of a target DNA 
sequence. A third DNA sequence is positioned between the 
first and second DNA sequences and encodes a positive 
.selection marker which when expressed is functional in the 
target cell in which the vector is used. A fourth DNA 
sequence encoding a negative selection marker, also func- 
tional in the target cell, is positioned 5' to the first or 3' to the 
second DNA sequence and is substantially incapable of 
homologous recombination with the target DNA sequence. 

The above PNS vector containing two homologous por- 
tions and a positive and a negative selection marker can be 
used in the methods of the invention to modify target DNA 
sequences. In this method, cells are first transfected with the 
above vector. During this transformation, the PNS vector is , 
most frequently randomly integrated into the genome of the 
cell. In this case, substantially all of the PNS vector con- 
taining the first, second, third and fourth DNA sequences is 
inserted into the genome. However, some of the PNS vector 
is integrated into the genome via homologous recombina- , 
tion. When homologous recombination occurs between the 
homologous portions of the first and second DNA sequences 
of the PNS vector and the corresponding homologous por- 
tions of the endogenous target DNA of the cell, the fourth 
DNA sequence containing the negative selection marker is . 
not incorporated into the genome. This is because the 
negative selection marker lies outside of the regions of 
homology in the endogenous target DNA sequence. As a 
consequence, at least two cell populations are formed. That 
cell population wherein random integration of the vector has 
occurred can be selected against by way of the negative 
selection marker contained in the fourth DNA sequence. 
This is because random events occur by integration at the 
ends of linear DNA. The other cell population wherein gene 
targeting has occurred by homologous recombination are i 
positively selected by way of the positive selection marker 
contained in the third DNA sequence of the vector. This cell 
population does not contain the negative selection marker 
and thus survives the negative selection. The net effect of 
this positive-negative selection method is to substantially , 
enrich for transformed cells containing a modified target 
DNA sequence. 

If in the above PNS vector, the third DNA sequence 
containing the positive selection marker is positioned 
between first and second DNA sequences corresponding to - 
DNA sequences encoding a portion of a polypeptide (e.g. 
within the exon of a eucaryotic organism) or within a 
regulatory region necessary for gene expression, homolo- 
gous recombination allows for the selection of cells wherein 
the gene containing such target DNA sequences is modified , 
such that it is non functional. 

If, however, the positive selection marker contained in the 
third DNA sequence .of the PNS vector is positioned within 
an untranslated region of the genome, e.g. within an intron 
in a eucaryotic gene, modifications of the surrounding target i 
sequence (e.g. exons and/or regulatory regions) by way of 
substitution, insertion and/or deletion of one or more nucle- 



otides may be made without eliminating the functional 
character of the target gene. 

The invention also includes transformed cells containing 
at least one predetermined modification of a target DNA 
sequence contained in the genome of the cell. 

In addition, the invention includes organisms such as 
non-human transgenic animals and plants which contain 
cells having predetermined modifications of a target DNA 
sequence in the genome of the organism. 

Various other aspects of the invention will be apparent 
from the following detailed description, appended drawings 



BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 depicts the positive-negative selection (PNS) vec- 
tor of the invention and a target DNA sequence. 

FIGS. 2A and 2B depict the results of gene targeting 
(homologous recombination) and random integration of a 
PNS vector into a genome respectively. 

FIG. 3 depicts a PNS vector containing a positive selec- 
tion marker within a sequence corresponding, in part, to an 
intron of a target DNA sequence. 

FIG. 4 is a graphic representation of the absolute fre- 
quency of homologous recombination versus the amount of 
100% sequence homology in the first and second DNA 
sequences of the PNS vectors of the invention. 

FIGS. SA, 5B, 5C and 5D depict the construction of a 
PNS vector used to disrupt the INT-2 gene. 

FIG. 6 depicts the construction of a PNS vector used to 
disrupt the HOX1. 4 gene. 

FIGS. 7A, 7B and 7C depict the construction of a PNS 
vector used to transform endothelial cells to express factor 

vm. 

FIG. 8 depicts a PNS vector to correct a defect in the 
purine nucleoside phosphorylase gene. 

FIG. 9 depicts a vector for promoterless PNS. 

FIG. 10 depicts the construction of a PNS vector to target 
an inducible promoter into the int-2 locus. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The positive-negative selection ("PNS") methods and 
vectors of the invention are used to modify target DNA 
sequences in the genome of cells capable of homologous 



n of a PNS vector of the invention is 
shown in FIG. 1. As can be seen, the PNS vector comprises 
four DNA sequences. The first and second DNA sequences 
each contain portions which are substantially homologous to 
corresponding homologous portions in first and second 
regions of the targeted DNA. Substantial homology is nec- 
essary between these portions in the PNS vector and the 
target DNA to insure targeting of the PNS vector to the 
appropriate region of the genome. 

As used herein, a "target DNA sequence" is a predeter- 
mined region within the genome of a cell which is targeted 
for modification by the PNS vectors of the invention. Target 
DNA sequences include structural genes (i.e., DNA 
sequences encoding polypeptides including in the case of 
eucaryots, introns and exons), regulatory sequences such as 
enhancers sequences, promoters and the like and other 
regions within the genome of interest. A target DNA 
sequence may also be a sequence which, when targeted by 



a vector has no effect on the function of the host genome. 
Generally, the target DNA contains at least first and second 
regions. See FIG. 1. Each region contains a homologous 
sequence portion which is used to design the PNS vector of 
the invention. In some instances, the target DNA sequence 5 
also includes a third and in some cases a third and fourth 
region. The third and fourth regions are substantially con- 
tiguous with the homologous portions of the first and second 
region. The homologous portions of the target DNA are 
homologous to sequence portions contained in the PNS 10 
vector. The third and in some cases third and fourth regions 
define genomic DNA sequences within the target DNA 
sequence which are not substantially homologous to the 
fourth and in some cases fourth and fifth DNA sequences of 
the PNS vector. 

Also included in the PNS vector are third and fourth DNA 
sequences which encode respectively "positive" and "nega- 
tive" selection markers. Examples of preferred positive and 
negative selection markers are listed in Table I. The third 2 o 
DNA sequence encoding the positive selection marker is 
positioned between the first and second DNA sequences 
while the fourth DNA sequence encoding the negative 
selection marker is positioned either 3' to the second DNA 
sequences shown in FIG. 1, or 5' to the first DNA sequence 2 5 
(not shown in FIG. 1). The positive and negative selection 
markers are chosen such that they are functional in the cells 
containing the target DNA. 

Positive and/or negative selection markers are "func- 
tional" in transformed cells if the phenotype expressed by 30 
the DNA sequences encoding such selection markers is 
capable of conferring either a positive or negative selection 
characteristic for the cell expressing that DNA sequence. 
Thus, "positive selection" comprises contacting cells trans- 
fected with a PNS vector with an appropriate agent which 35 
kills or otherwise selects against cells not containing an 
integrated positive selection marker. "Negative selection" on 
the other hand comprises contacting cells transfected with 
the PNS vector with an appropriate agent which kills or 
otherwise selects against cells containing the negative selec- 40 
tion marker. Appropriate agents for use with specific posi- 
tive and negative selection markers and appropriate concen- 
trations are listed in Table I. Other positive selection markers 
include DNA sequences encoding membrane bound 
polypeptides. Such polypeptides are well known to those 45 
skilled in the art and contain a secretory sequence, an 
extracellular domain, a transmembrane domain and an intra- 
cellular domain. When expressed as a positive selection 
marker, such polypeptides associate with the target cell 
membrane. Fluorescently labelled antibodies specific for the 50 
extracellular domain may then be used in a fluoresence 
activated cell sorter (FACS) to select for cells expressing the 
membrane bound polypeptide. FACS selection may occur 
before or after negative selection. 

55 

TABLE I 

Selectable Markers for Use in PNS-Vectors 



TABLE I-continued 



Markers for Use in PNS-Vectors 



Agent Organism 



Hprt 
HSV-tk 



Ricin toxin 




G418 

Kanamycin 
Hygromycin 



5-500 Plants 
10-1000 Eukaryotes 



The expression of the negative selection marker in the fourth 
DNA sequence is generally under control of appropriate 
regulatory sequences which render its expression in the 
target cell independent of the expression of other sequences 
in the PNS vector or the target DNA. The positive selection 
marker in the third DNA, however, may be constructed so 
that it is independently expressed (eg. when contained in an 
intron of the target DNA) or constructed so that homologous 
recombination will place it under control of regulatory 
sequences in the target DNA sequence. The strategy and 
details of the expression of the positive selection marker will 
be discussed in more detail hereinafter. 

The positioning of the negative selection marker as being 
either "5"' or "3"' is to be understood as relating to the 
positioning of the negative selection marker relative to the 5' 
or 3' end of one of the strands of the double-stranded PNS 
vector. This should be apparent from FIG. 1. The positioning 
of the various DNA sequences within the PNS vector, 
however, does not require that each of the four DNA 
sequences be transcriptionally and translationally aligned on 
a single strand of the PNS vector. Thus, for example, the first 
and second DNA sequences may have a 5' to 3' orientation 
consistent with the 5' to 3' orientation of regions 1 and 2 in 
the target DNA sequence. When so aligned, the PNS vector 
is a "replacement PNS vector" upon homologous recombi- 
nation the replacement PNS vector replaces the genomic 
DNA sequence between the homologous portions of the 
target DNA with the DNA sequences between the homolo- 
gous portion of the first and second DNA sequences of the 
PNS vector. Sequence replacement vectors are preferred in 
practicing the invention. Alternatively, the homologous por- 
tions of the first and second DNA sequence in the PNS 
vector may be inverted relative to each other such that the 
homologous portion of DNA sequence 1 corresponds 5' to 3' 
with the homologous portion of region 1 of the target DNA 
sequence whereas the homologous portion of DNA sequence 
2 in the PNS vector has an orientation which is 3' to 5' for 
the homologous portion of the second region of the second 
region of the target DNA sequence. This inverted orientation 
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provides for and "insertion PNS vector". When an insertion 
PNS vector is homologously inserted into the target DNA 
sequence, the entire PNS vector is inserted into the target 
DNA sequence without replacing the homologous portions 
in the target DNA. The modified target DNA so obtained 5 
necessarily contains the duplication of at least those homolo- 
gous portions of the target DNA which are contained in the 
PNS vector. Sequence replacement vectors and sequence 
insertion vectors utilizing a positive selection marker only 
are described by Thomas et al. (1987), Cell, 51, 503-512. 10 

Similarly, the third and fourth DNA sequences may be 
transcriptionally inverted relative to each other and to the 
transcriptional orientation of the target DNA sequence. This 
is only the case, however, when expression of the positive 
and/or negative selection marker in the third and/or fourth 15 
DNA sequence respectively is independently controlled by 
appropriate regulatory sequences. When, for example a 
promoterless positive selection marker is used as a third 
DNA sequence such that its expression is to be placed under 
control of an endogenous regulatory region, such a vector 20 
requires that the positive selection marker be positioned so 
that it is in proper alignment (5' to 3' and proper reading 
frame) with the transcriptional orientation and sequence of 
the endogenous regulatory region. 

Positive-negative selection requires that the fourth DNA 25 
sequence encoding the negative marker be substantially 
incapable of homologous recombination with the target 
DNA sequence. In particular, the fourth DNA sequence 
should be substantially non-homologous to a third region of 
the target DNA. When the fourth DNA sequence is posi- 30 
tioned 3' to the second DNA sequence, the fourth DNA 
sequence is non-homologous to a third region of the target 
DNA which is adjacent to the second region of the target 
DNA. See FIG. 1. When the fourth DNA sequence is located 
5' to the first DNA sequence, it is non-homologous to a 35 
fourth region of the target DNA sequence adjacent to the first 
region of the target DNA. 

In some cases, the PNS vector of the invention may be 
constructed with a fifth DNA sequence also encoding a 
negative selection marker. In such cases, the fifth DNA 40 
sequence is positioned at the opposite end of the PNS vector 
to that containing the fourth DNA sequence. The fourth 
DNA sequence is substantially non-homologous to the third 
region of the target DNA and the fifth DNA sequence is 
substantially non-homologous to the fourth region of the 45 
target DNA. The negative selection markers contained in 
such a PNS vector may either be the same or different 
negative selection markers. When they are different such 
that they require the use of two different agents to select 
again cells containing such negative markers, such negative 50 
selection may be carried out sequentially or simultaneously 
with appropriate agents for the negative selection marker. 
The positioning of two negative selection markers at the 5' 
and 3 ' end of a PNS vector further enhances selection against 
target cells which have randomly integrated the PNS vector. 55 
This is because random integration sometimes results in the 
rearrangement of the PNS vector resulting in excision of all 
or part of the negative selection marker prior to random 
integration. When this occurs, cells randomly integrating the 
PNS vector cannot be selected against. However, the pres- 60 
ence of a second negative selection marker on the PNS 
vector substantially enhances the likelihood that random 
integration will result in the insertion of at least one of the 
two negative selection markers. 

The substantial non-homology between the fourth DNA 65 
sequence (and in some cases fourth and fifth DNA 
sequences) of the PNS vector and the target DNA creates a 
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discontinuity in sequence homology at or near the juncture 
of the fourth DNA sequence. Thus, when the vector is 
integrated into the genome by way of the homologous 
recombination mechanism of the cell, the negative selection 
marker in the fourth DNA sequence is not transferred into 
the target DNA. It is the non-integration of this negative 
selection marker during homologous recombination which 
forms the basis of the PNS method of the invention. 

As used herein, a "modifying DNA sequence" is a DNA 
sequence contained in the first, second and/or third DNA 
sequence which encodes the substitution, insertion and/or 
deletion of one or more nucleotides in the target DNA 
sequence after homologous insertion of the PNS vector into 
the targeted region of the genome. When the PNS vector 
contains only the insertion of the third DNA sequence 
encoding the positive selection marker, the third DNA 
sequence is sometimes referred to as a "first modifying DNA 
sequence". When in addition to the third DNA sequence, the 
PNS vector also encodes the further substitution, insertion 
and/or deletion of one or more nucleotides, that portion 
encoding such further modification is sometimes referred to 
as a "second modifying DNA sequence". The second modi- 
fying DNA sequence may comprise the entire first and/or 
second DNA sequence or in some instances may comprise 
less than the entire first and/or second DNA sequence. The 
latter case typically arises when, for example, a heterologous 
gene is incorporated into a PNS vector which is designed to 
place that heterologous gene under the regulatory control of 
endogenous regulatory sequences. In such a case, the 
homologous portion of, for example, the first DNA sequence 
may comprise all or part of the targeted endogenous regu- 
latory sequence and the modifying DNA sequence com- 
prises that portion of the first DNA sequence (and in some 
cases a part of the second DNA sequence as well) which 
encodes the heterologous DNA sequence. An appropriate 
homologous portion in the second DNA sequence will be 
included to complete the targeting of the PNS vector. On the 
other hand, the entire first and/or second DNA sequence may 
comprise a second modifying DNA sequence when, for 
example, either or both of these DNA sequences encode for 
the correction of a genetic defect in the targeted DNA 



As used herein, "modified target DNA sequence" refers to 
a DNA sequence in the genome of a targeted cell which has 
been modified by a PNS vector. Modified DNA sequences 
contain the substitution, insertion and/or deletion of one or 
more nucleotides in a first transformed target cell as com- 
pared to the cells from which such transformed target cells 
are derived. In some cases, modified target DNA sequences 
are referred to as "first" and/or "second modified target DNA 
sequences". These correspond to the DNA sequence found 
in the transformed target cell when a PNS vector containing 
a first or second modifying sequence is homologously 
integrated into the target DNA sequence. 

'Transformed target cells" sometimes referred to as "first 
transformed target cells" refers to those target cells wherein 
the PNS vector has been homologously integrated into the 
target cell genome. A "transformed cell" on the other hand 
refers to a cell wherein the PNS has non-homologously 
inserted into the genome randomly. 'Transformed target 
cells" generally contain a positive selection marker within 
the modified target DNA sequence. When the object of the 
genomic modification is to disrupt the expression of a 
particular gene, the positive selection marker is generally 
contained within an exon which effectively disrupts tran- 
scription and/or translation of the targeted endogenous gene. 
When, however, the object of the genomic modification is to 
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insert an exogenous gene or correct an endogenous gene selection marker (neo 1 ) and a herpes simplex virus thymi- 

defect, the modified target DNA sequence in the first trans- dine kinase (HSV-tk) gene as a negative selection marker, 

formed target cell will in addition contain exogenous DNA The neo r positive selection marker is positioned in an exon 

sequences or endogenous DNA sequences corresponding to of gene X. This positive selection marker is constructed such 

those found in the normal, i.e., nondefective, endogenous 5 that it's expression is under the independent control of 

gene. appropriate regulatory sequences. Such regulatory 

"Second transformed target cells" refers to first trans- sequences may be endogenous to the host cell in which case 

formed target cells whose genome has been subsequently they are preferably derived from genes actively expressed in 

modified in a predetermined way. For example, the positive the cell type. Alteratively, such regulatory sequences may be 

selection marker contained in the genome of a first trans- 10 inducible to permit selective activation of expression of the 

formed target cell can be excised by homologous recombi- positive selection marker. 

nation to produce a second transformed target cell. The On each side of the neo r marker are DNA sequences 

details of such a predetennined genomic manipulation will homologous to the regions 5' and 3' from the point of neo r 

be described in more detail hereinafter. insertion in the exon sequence. These flanking homologous 

As used herein, "heterologous DNA" refers to a DNA is sequences target the X gene for homologous recombination 

sequence which is different from that sequence comprising with the PNS vector. Consistent with the above description 

the target DNA sequence. Heterologous DNA differs from of the PNS vector, the negative selection marker HSV-tk is 

target DNA by the substitution, insertion and/or deletion of situated outside one of the regions of homology. In this 

one or more nucleotides. Thus, an endogenous gene example it is 3' to the transcribed region of gene X. The neo r 

sequence may be incorporated into a PNS vector to target its 20 gene confers resistance to the drug G41 8 (G41 S R ) whereas 

insertion into a different regulatory region of the genome of the presence of the HSV-tk gene renders cells containing this 

the same organism. The modified DNA sequence so gene sensitive to gancyclovir (GANC 5 ). When the PNS 

obtained is a heterologous DNA sequence. Heterologous vector is randomly inserted into the genome by a mechanism 

DNA sequences also include endogenous sequences which other than by homologous recombination (FIG. 2b), inser- 

have been modified to correct or introduce gene defects or 25 tion is most frequently via the ends of the linear DNA and 

to change the amino acid sequence encoded by the endog- thus the phenotype for such cells is neo + HSV-tk + (G418*, 

enous gene. Further, heterologous DNA sequences include GANG 5 ). When the PNS vector is incorporated into the 

exogenous DNA sequences which are not related to endog- genome by homologous recombination as in FIG. 2a, the 

enous sequences, e.g. sequences derived from a different resultant phenotype is neo H \ HSV-tk~(G418 s , GANC*). 

species. Such "exogenous DNA sequences" include those 30 Thus, those cells wherein random integration of the PNS 

which encode exogenous polypeptides or exogenous regu- vector has occurred can be selected against by treatment 

latory sequences. For example, exogenous DNA sequences with GANC. Those remaining transformed target cells 

which can be introduced into murine or bovine ES cells for wherein homologous recombination has been successful can 

tissue specific expression (e.g. in mammary secretory cells) then be selected on the basis of neomycin resistance and 

include human blood factors such as t-PA, Factor VTH, 35 GANC resistance. It, of course, should be apparent that the 

serum albumin and the like. DNA sequences encoding order of selection for and selection against a particular 

positive selection markers are further examples of heterolo- genotype is not important and that in some instances positive 

gous DNA sequences. and negative selection can occur simultaneously. 

The PNS vector is used in the PNS method to select for As indicated, the neomycin resistance gene in FIG. 2 is 

transformed target cells containing the positive selection 40 incorporated into an exon of gene X. As so constructed, the 

marker and against those transformed cells containing the integration of the PNS vector by way of homologous recom- 

negative selection marker. Such positive-negative selection bination effectively blocks the expression of gene X. In 

procedures substantially enrich for those transformed target multicellular organisms, however, integration is predomi- 

cells wherein homologous recombination has occurred. As nantly random and occurs, for the most part, outside of the 

used herein, "substantial enrichment" refers to at least a 45 region of the genome encoding gene X. Non-homologous 

two-fold enrichment of transformed target cells as compared recombination therefore will not disrupt gene X in most 

to the ratio of homologous transformants versus nonhomolo- instances. The resultant phenotypes will therefore, in addi- 

gous transformants, preferably a 10-fold enrichment, more tion to the foregoing, will also be X~ for homologous 

preferably a 1000-fold enrichment, most preferably a recombination and X + for random integration. In many cases 

10,000-fold enrichment, i.e., the ratio of transformed target 50 it is desirable to disrupt genes by positioning the positive 

cells to transformed cells. In some instances, the frequency selection marker in an exon of a gene to be disrupted or 

of homologous recombination versus random integration is modified. For example, specific proto-oncogenes can be 

of the order of 1 in 1000 and in some cases as low as 1 in mutated by this method to produce transgenic animals. Such 

10,000 transformed cells. The substantial enrichment transgenic animals containing selectively inactivated proto- 

obtained by the PNS vectors and methods of the invention 55 oncogenes are useful in dissecting the genetic contribution 

often result in cell populations wherein about 1%, and more of such a gene to oncogenesis and in some cases normal 

preferably about 20%, and most preferably about 95% of the development. 

resultant cell population contains transformed target cells Another potential use for gene inactivation is disruption 

wherein the PNS vector has been homologously integrated. of proteinaceous receptors on cell surfaces. For example, 

Such substantially enriched transformed target cell popula- 60 cell lines or organisms wherein the expression of a putative 

tions may thereafter be used for subsequent genetic manipu- viral receptor has been disrupted using an appropriate PNS 

lation, for cell culture experiments or for the .production of vector can be assayed with virus to confirm that the receptor 

transgenic organisms such as transgenic animals or plants. is, in fact, involved in viral infection. Further, appropriate 

FIGS. 2a and 2b show the consequences of gene targeting PNS vectors may be used to produce transgenic animal 

(homologous recombination) and random integration of a 65 models for specific genetic defects. For example, many gene 

PNS vector into the genome of a target cell. The PNS vector defects have been characterized by the failure of specific 

shown contains a neomycin resistance gene as a positive genes to express functional gene product, e.g. a and P 
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thalassema, hemophilia, Gaucher's disease and defects 
affecting the production of cc-l-antitrypsin, ADA, PNP, 
phenylketonurea, familial hypercholesterolemia and retino- 
blastemia. Transgenic animals containing disruption of one 
or both alleles associated with such disease states or modi- 5 
fication to encode the specific gene defect can be used as 
models for therapy. For those animals which are viable at 
birth, experimental therapy can be applied. When, however, 
the gene defect affects survival, an appropriate generation 
(e.g. FO, Fl) of transgenic animal may be used to study in 10 
vivo techniques for gene therapy. 

A modification of the foregoing means to disrupt gene X 
by way of homologous integration involves the use of a 
positive selection marker which is deficient in one or more 
regulatory sequences necessary for expression. The PNS 15 
vector is constructed so that part but not all of the regulatory 
sequences for gene X are contained in the PNS vector 5' 
from the structural gene segment encoding the positive 
selection marker, e.g., homologous sequences encoding part 
of the promotor of the X gene. As a consequence of this 20 
construction, the positive selection marker is not functional 
in the target cell until such time as it is homologously 
integrated into the promotor region of gene X. When so 
integrated, gene X is disrupted and such cells may be 
selected by way of the positive selection marker expressed 25 
under the control of the target gene promoter. The only 
limitation in using such an approach is the requirement that 
the targeted gene be actively expressed in the cell type used. 
Otherwise, the positive selection marker will not be 
expressed to confer a positive selection characteristic on the 30 
cell. 

In many instances, the disruption of an endogenous gene 
is undesirable, e.g., for some gene therapy applications. In 
such situations, the positive selection marker comprising the 
third DNA sequence of the PNS vector may be positioned 35 
within an untranslated sequence, e.g. an intron of the target- 
DNA or 5' or 3' untranslated regions. FIG. 3 depicts such a 
PNS vector. As indicated, the first DNA sequence comprises 
part of exon I and a portion of a contiguous intron in the 
target DNA. The second DNA sequence encodes an adjacent 40 
portion of the same intron and optionally may include all or 
a portion of exon n. The positive selection marker of the 
third DNA sequence is positioned between the first and 
second sequences. The fourth DNA sequence encoding the 
negative selection marker, of course, is positioned outside of 45 
the region of homology. When the PNS vector is integrated 
into the target DNA by way of homologous recombination 
the positive selection marker is located in the intron of the 
targeted gene. The third .DNA sequence is constructed such 
that it is capable of being expressed and translated indepen- 50 
dently of the targeted gene. Thus, it contains an independent 
functional promotor, translation initiation sequence, trans- 
lation termination sequence, and in some cases a polyade- 
nylation sequence and/or one or more enhancer sequences, 
each functional in the cell type transfected with the PNS 55 
vector. In this manner, cells incorporating the PNS vector by 
way of homologous recombination can be selected by way 
of the positive selection marker without disruption of the 
endogenous gene. Of course, the same regulatory sequences 
can be used to control the expression of the positive selec- 60 
tion marker when it is positioned within an exon. Further, 
such regulatory sequences can be used to control expression 
of the negative selection marker. Regulatory sequences 
useful in controlling the expression of positive and/or nega- 
tive selection markers are listed in Table HB. Of course, 65 
other regulatory sequences may be used which are known to 
those skilled in the art. In each case, the regulatory 



sequences will be properly aligned and, if necessary, placed 
in proper reading frame with the particular DNA sequence to 
be expressed. Regulatory sequence, e.g. enhancers and pro- 
moters from different sources may be combined to provide 
modulated gene expression. 

TABLE nA 

Tissue SpeciBc Regulatory Sequences 
Cell/Tissue Promote 



Adrenal 


PNMT 


Baetge, et al. (1988) 
PNAS 85 


Erythoroid 




Townes et aL (1985) 
EMBO J 4:1715 




a-crystallin 


Overteek et al. (1985) 
PNAS 82:7815 




a-FP 


Krumlaufetal. (1985) 
MCB 5:1639 


Lymphoid 


Iffli (Y D 

promoter/enhancer 


Yamamuraetal. (1986) 
PNAS 83:2152 




WAP 
MBP 


Gordon et al. (1987) 
Bio/Tech 5:1183 
Tamura et al. (1989) 
MCB 9:3122 


Pancreas (B) 


Insulin 


Hanaban (1985) Nature 
315:115 


Pancreas 




Swift et al. (1984) 


(exocrine) 




Cell 38:639 


Pituitary 


Prolactin 


Ingraham et al. (1988) 
Cell 55:579 


Skeletal 




Johnson et al. (1989) 






MCB 9:3393 


Testes 


Protamine 


Stewart et al. (1988) 
MCB 8:1748 


TABLE HB 




Regulatory Sequence! 
Positive and/or Negative 


i for Use With 
Selection Markers 



PYF441 enbancer/HSV-tk promoter 

(pMCI-Neo control) 

ASV-LTR 



SFFV 

Mannopine synthase 
Octapine synthase 
Nopaline synthase 
Cauliflower mosiac virus 3; 



fi-phaseolin 




A modification of the target DNA sequence is also shown 
in FIG. 3. In exon I of the target DNA sequence, the sixth 
codon GTG is shown which encodes valine. In the first DNA 
sequence of the PNS vector, the codon GAG replaces the 
GTG codon in exon I. This latter codon encodes glutamine. 
Cells selected for homologous recombination as a conse- 
quence encode a modified protein wherein the amino acid 
encoded by the sixth codon is changed from valine to 
glutamine. 

There are, of course, numerous other examples of modi- 
fications of target DNA sequences in the genome of the cell 
which can be obtained by the PNS vectors and methods of 
the invention. For example, endogenous regulatory 
sequences controlling the expression of proto-oncogenes 
can be replaced with regulatory sequences such as promoters 
and/or enhancers which actively express a particular gene in 
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a specific cell type in an organism, i.e., tissue-specific 
regulatory sequences. In this manner, the expression of a 
proto-oncogene in a particular cell type, for example in a 
transgenic animal, can be controlled to determine the effect 
of oncogene expression in a cell type which does not 5 
normally express the proto-oncogene. Alternatively, known 
viral oncogenes can be inserted into specific sites of the 
target genome to bring about tissue-specific expression of 
the viral oncogene. Examples of preferred tissue-specific 
regulatory sequences are listed in Table HA. Examples of 10 
proto-oncogenes which may be modified by the PNS vectors 
and methods to produce tissue specific expression and viral 
oncogenes which may be placed under control of endog- 
enous regulatory sequences are listed in Table IUA and MB, 
respectively. 15 

TABLE IDA 



chronic myelogenous leukemia 




HPV-E6 
HPV-E7 
PyTag 



Sv40Tag 

v-abl 

v-ips 



As indicated, the positive-negative selection methods and 
vectors of the invention are used to modify target DNA 
sequences in the genome of target cells capable of homolo- 
gous recombination. Accordingly, the invention may be 45 
practiced with any cell type which is capable of homologous 
recombination. Examples of such target cells include cells 
derived from vertebrates including mammals such as 
humans, bovine species, ovine species, murine species, 
simian species, and other eucaryotic organisms such as 50 
filamentous fungi, and higher multicellular organisms such 
as plants. The invention may also be practiced with lower 
organisms such as gram positive and gram negative bacteria 
capable of homologous recombination. However, such 
lower organisms are not preferred because they generally do 55 
not demonstrate significant non-homologous recombination, 
i.e., random integration. Accordingly, there is little or no 
need to select against non-homologous transformants. 

In those cases where the ultimate goal is the production of 
a non-human transgenic animal, embryonic stem cells (ES 60 
cells) are preferred target cells. Such cells have been 
manipulated to introduce transgenes. ES cells are obtained 
from pre-implantation embryos cultured in vitro. Evans, M. 
J., et al. (1981), Nature, 292, 154-156; Bradley, M. O., et al. 
(1984), Nature, 309, 255-258; Gossler, et al. (1986), Proc. 65 
Natl. Acad. Sci. U.S.A., 83, 9065-9069; and Robertson, et al. 
(1986), Nature, 322, 445^(48. PNS vectors can be effi- 
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ciently introduced into the ES cells by electroporation or 
microinjection or other transformation methods, preferably 
electroporation. Such transformed ES cells can thereafter be 
combined with blastocysts from a non-human animal. The 
ES cells thereafter colonize the embryo and can contribute 
to the germ line of the resulting chimeric animal. For review 
see Jaenisch, R. (1988), Science, 240, 1468-1474. In the 
present invention, PNS vectors are targeted to a specific 
portion of the ES cell genome and thereafter used to generate ' 
chimeric transgenic animals by standard techniques. 

When the ultimate goal is gene therapy to correct a 
genetic defect in an organism such as a human being, the cell 
type will be determined by the etiology of the particular 
disease and how it is manifested. For example, hemopoietic 
stem cells are a preferred cells for correcting genetic defects 
in cell types which differentiate from such stem cells, e.g. 
erythrocytes and leukocytes. Thus, genetic defects in globin 
chain synthesis in erythrocytes such as sickle cell anemia, 
p-thalassemia and the like may be corrected by using the 
PNS vectors and methods of the invention with hematopoi- 
etic stem cells isolated from an affected patient. For 
example, if the target DNA in FIG. 3 is the sickle-cell 
p-globin gene contained in a hematopoietic stem cell and the 
PNS vector in FIG. 3 is targeted for this gene with the 
modification shown in the sixth codon, transformed hemato- 
poietic stem cells can be obtained wherein a normal p-globin 
will be expressed upon differentiation. After correction of 
the defect, the hematopoietic stem cells may be returned to 
the bone marrow or systemic circulation of the patient to 
form a subpopulation of erythrocytes containing norma] 
hemoglobin. Alternatively, hematopoietic stem cells may be 
destroyed in the patient by way of irradiation and/or che- 
motherapy prior to reintroduction of the modified hemato- 
poietic stem cell thereby completely rectifying the defect. 

Other types of stem cells may be used to correct the 
specific gene defects associated with cells derived from such 
stem cells. Such other stem cells include epithelial, liver, 
lung, muscle, endothelial, menchymal, neural and bone stem 
cells. Table IY identifies a number of known genetic defects 
which are amenable to correction by the PNS methods and 
vectors of the invention. 

Alternatively, certain disease states can be treated by 
modifying the genome of cells in a way which does not 
correct a genetic defect per se but provides for the supple- 
mentation of the gene product of a defective gene. For 
example, endothelial cells are preferred as targets for human 
gene therapy to treat disorders affecting factors normally 
present in the systemic circulation. In model studies using 
both dogs and pigs endothelial cells have been shown to 
form primary cultures, to be transformable with DNA in 
culture, and to be capable of expressing a transgene upon 
re-implantation in arterial grafts into the host organism. 
Wilson, et al. (1989), Science, 244, 1344; Nabel, et al. 
(1989), Science, 244, 1342. Since endothelial cells form an 
integral part of the graft, such transformed cells can be used 
to produce proteins to be secreted into the circulatory system 
and thus serve as therapeutic agents in the treatment of 
genetic disorders affecting circulating factors. Examples of 
such diseases include insulin-deficient diabetes, ot-1 -antit- 
rypsin deficiency, and hemophilia. Epithelial cells provide a 
particular advantage in the treatment of factor VlH-deficient 
hemophilia. These cells naturally produce von Willebrand 
factor and it has been shown that production of active factor 
VET is dependant upon the autonomous synthesis of vWF 
(Toole, et al. (1986), Proc. Natl. Acad. Sci. U.SA., 83, 
5939). 

As indicated in Example 4, human endothelial cells from 
a hemophiliac patient deficient in Factor VTA are modified 
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by a PNS vector to produce an enriched population of 
transformed endothelial cells wherein the expression of 
DNA sequences encoding a secretory form of Factor VIE is 
placed under the control of the regulatory sequences of the 
endogenous |J-actin gene. Such transformed cells are 5 
implanted into vascular grafts from the patient. After incor- 
poration of transformed cells, it is grafted back into the 
vascular system of the patient. The transformed cells secrete 
Factor XTJI into the vascular system to supplement the 
defect in the patients blood clotting system. 

Other diseases of the immune and/or the circulatory 
system are candidates for human gene therapy. The target 
tissue, bone marrow, is readily accessible by current tech- 
nology, and advances are being made in culturing stem cells 
in vitro. The immune deficiency diseases caused by muta- 
tions in the enzymes adenosine deaminase (ADA) and 15 
purine nucleotide phosphorylase (PNP), are of particular 
interest. Not only have the genes been cloned, but cells 
corrected by PNS gene therapy are likely to have a selective 
advantage over their mutant counterparts. Thus, ablation of 
the bone marrow in recipient patients may not be necessary. 20 

The PNS approach is applicable to genetic disorders with 
the following characteristics: first, the DNA sequence and 
preferably the cloned normal gene must be available; sec- 
ond, the appropriate, tissue relevant, stem cell or other 
appropriate cell must be available. Below is Table IV listing 25 
some of the known genetic diseases, the name of the cloned 
gene, and the tissue type in which therapy may be appro- 
priate. These and other genetic disease amenable to the PNS 
methods and vectors of the invention have been reviewed. 
See Friedman (1989), Science, 244, 1275; Nichols, E. K. 30 
(1988), Human Gene Therapy (Harvard University Press); 
and Cold Springs Harbor Symposium on Quantitative Biol- 
ogy, Vol. 11 (1986), "The Biology of Homo Sapiens" (Cold 
Springs Harbor Press). ^ 

TABLE IV 
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Gaucher Disease 
Granulocyte Actin 
Deficiency 



Muscular 
Dystrophy 
Phenylketonuri 



Purine nucleoside 
most likely 
dystropin gene 



As indicated, genetic defects may be corrected in specific 60 
cell lines by positioning the positive selection marker (the 
second DNA sequence in the PNS vector) in an untranslated 
region such as an intron near the site of the genetic defect 
together with flanking segments to correct the defect. In this 
approach, the positive selection marker is under its own 65 
regulatory control and is capable of expressing itself without 
substantially interfering with the expression of the targeted 



gene. In the case of human gene therapy, it may be desirable 
to introduce only those DNA sequences which are necessary 
to correct the particular genetic defect. In this regard, it is 
desirable, although not necessary, to remove the residual 
positive selection marker which remains after correction of 
the genetic defect by homologous recombination. 

The removal of a positive selection marker from a 
genome in which homologous insertion of a PNS vector has 
occurred can be accomplished in many ways. For example, 
the PNS vector can include a second negative selection 
marker contained within the third DNA sequence. This 
second negative selection marker is different from the first 
negative selection marker contained in the fourth DNA 
sequence. After homologous integration, a second modified 
target DNA sequence is formed containing the third DNA 
sequence encoding both the positive selection marker and 
the second negative selection marker. After isolation and 
purification of the first transformed target cells by way of 
negative selection against transformed cells containing the 
first negative selection marker and for those cells containing 
the positive selection marker, the first transformed target 
cells are subjected to a second cycle of homologous recom- 
bination. In this second cycle, a second homologous vector 
is used which contains all or part of the first and second DNA 
sequence of the PNS vector (encoding the second modifi- 
cation in the target DNA) but not those sequences encoding 
the positive and second negative selection markers. The 
second negative selection marker in the first transformed 
target cells is then used to select against unsuccessful 
transformants and cells wherein the second homologous 
vector is randomly integrated into the genome. Homologous 
recombination of this second homologous vector, however, 
with the second modified target DNA sequence results in a 
second transformed target cell type which does not contain 
either the positive selection marker or the second negative 
selection marker but which retains the modification encoded 
by the first and/or second DNA sequences. CeDs which have 
not homologously integrated the second homologous vector 
are selected against using the second negative selection 
marker. 

The PNS vectors and methods of the invention are also 
applicable to the manipulation of plant cells and ultimately 
the genome of the entire plant. A wide variety of transgenic 
plants have been reported, including herbaceous dicots, 
woody dicots and monocots. For a summary, see Gasser, et 
al. (1989), Science, 244, 1293-1299. A number of different 
gene transfer techniques have been developed for producing 
such transgenic plants and transformed plant cells. One 
technique used Agrobacterium tumefaciens as a gene trans- 
fer system. Rogers, et al. (1986), Methods Enzymol, 118, 
627-640. A closely related transformation utilizes the bac- 
terium Agrobacterium rhizogenes. In each of these systems 
a Ti or Ri plant transformation vector can be constructed 
containing border regions which define the DNA sequence 
to be inserted into the plant genome. These systems previ- 
ously have been used to randomly integrate exogenous DNA 
to plant genomes. In the present invention, an appropriate 
PNS vector may be inserted into the plant transformation 
vector between the border sequences defining the DNA 
sequences transferred into the plant cell by the Agrobacte- 
rium transformation vector. 

Preferably, the PNS vector of the invention is directly 
transferred to plant protoplasts by way of methods analo- 
gous to that previously used to introduce transgenes into 
protoplasts. See, e.g. Paszkowski, et al. (1984), EMBO J., 3, 
2717-2722; Hain, et al. (1985), Mol. Gen. Genet., 199, 
161-168; Shillito, et al. (1985), Bio/Technology, 3, 



19 

1099-1103; and Negrutiu, et al. (1987), Plant Mol Bio., 8, 
363-373. Alternatively, the PNS vector is contained within 
a liposome which may be fused to a plant protoplast (see, 
e.g. Deshayes, et al. (1985), EMBO J., 4, 2731-2738) or is 
directly inserted to plant protoplast by way of intranuclear 5 
microinjection (see, e.g. Crossway. et al. (1986), Mol. Gen 
Genet, 202, 179-185, and Reich, et al. (1986), Bio/Tech- 
nology, 4, 1001-1004). Microinjection is the preferred 
method for transfecting protoplasts. PNS vectors may also 
be microinjected into meristematic inflorenscences. De la 
Pena et al. (1987), Nature, 325, 274-276. Finally, tissue 0 
explants can be transfected by way of a high velocity 
microprojectile coated with the PNS vector analogous to the 
methods used for insertion of transgenes. See, e.g. Vasil 
(1988), Bio/Technology, 6, 397; Klein, et al. (1987), Nature, 
327, 70; Klein, et al. (1988), Proc. Natl. Acad. Sci. U.S.A., 15 
85, 8502; McCabe, et al. (1988), Bio/Technology, 6, 923; and 
Klein, et al., Genetic Engineering, Vol 11, J. K. Setlow 
editor (Academic Press, N.Y., 1989). Such transformed 
explants can be used to regenerate for example various serial 
crops. Vasil (1988), Bio/Technology, 6, 397. 20 

Once the PNS vector has been inserted into the plant cell 
by any of the foregoing methods, homologous recombina- 
tion targets the PNS vector to the appropriate site in the plant 
genome. Depending upon the methodology used to transfect, 
positive-negative selection is performed on tissue cultures of 25 
the transformed protoplast or plant cell. In some instances, 
cells amenable to tissue culture may be excised from a 
transformed plant either from the F0 or a subsequent gen- 
eration. 

The PNS vectors and method of the invention are used to 30 
precisely modify the plant genome in a predetermined way. 
Thus, for example, herbicide, insect and disease resistance 
may be predictably engineered into a specific plant species 
to provide, for example, tissue specific resistance, e.g., 
insect resistance in leaf and bark. Alternatively, the expres- 35 
sion levels of various components within a plant may be 
modified by substituting appropriate regulatory elements to 
change the fatty acid and/or oil content in seed, the starch 
content within the plant and the elimination of components 
contributing to undesirable flavors in food. Alternatively, 40 
heterologous genes may be introduced into plants under the 
predetermined regulatory control in the plant to produce 
various hydrocarbons including waxes and hydrocarbons 
used in the production of rubber. 

The amino acid composition of various storage proteins in 45 
wheat and corn, for example, which are known to be 
deficient in lysine and tryptophan may also be modified. 
PNS vectors can be readily designed to alter specific codons 
within such storage proteins to encode lysine and/or tryp- 
tophan thereby increasing the nutritional value of such 50 
crops. For example, the zein protein in corn (Pederson et al. 
(1982), Cell, 29, 1015) may be modified to have a higher 
content of lysine and tryptophan by the vectors and methods 
of the invention. 

It is also possible to modify the levels of expression of 55 
various positive and negative regulatory elements control- 
ling the expression of particular proteins in various cells and 
organisms. Thus, the expression level of negative regulatory 
elements may be decreased by use of an appropriate pro- 
motor to enhance the expression of a particular protein or 60 
proteins under control of such a negative regulatory element. 
Alternatively, the expression level of a positive regulatory 
protein may be increased to enhance expression of the 
regulated protein or decreased to reduce the amount of. 
regulated protein in the cell or organism. 65 

The basic elements of the PNS vectors of the invention 
have already been described. The selection of each of the 
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DNA sequences comprising the PNS vector, however, will 
depend upon the cell type used, the target DNA sequence to 
be modified and the type of modification which is desired. 

Preferably, the PNS vector is a linear double stranded 
DNA sequence. However, circular closed PNS vectors may 
also be used. Linear vectors are preferred since they enhance 
the frequency of homologous integration into the target 
DNA sequence. Thomas, et al. (1986), Cell, 44, 49. 

In general, the PNS vector (including first, second, third 
and fourth DNA sequences) has a total length of between 2.5 
kb (2500 base pairs) and 1000 kb. The lower size limit is set 
by two criteria. The first of these is the minimum necessary 
length of homology between the first and second sequences 
of the PNS vector and the target locus. This minimum is 
approximately 500 bp (DNA sequence 1 plus DNA sequence 
2). The second criterion is the need for functional genes in 
the third and fourth DNA sequences of the PNS vector. For 
practical reasons, this lower limit is approximately 1000 bp 
for each sequence. This is because the smallest DNA 
sequences encoding known positive and negative selection 
markers are about 1.0-1.5 kb in length. 

The upper limit to the length of the PNS vector is 
determined by the state of the technology used to manipulate 
DNA fragments. If these fragments are propagated as bac- 
terial plasmids, a practical upper length limit is about 25 kb; 
if propagated as cosmids, the limit is about 50 kb, if 
propagated as YACs (yeast artificial chromosomes) the limit 
approaches 1000 kb (Burke, et al. (1987), Science, 236, 
806). 

Within the first and second DNA sequences of the PNS 
vector are portions of DNA sequence which are substantially 
homologous with sequence portions contained within the 
first and second regions of the target DNA sequence. The 
degree of homology between the vector and target sequences 
influences the frequency of homologous recombination 
between the two sequences. One hundred percent sequence 
homology is most preferred, however, lower sequence 
homology can be used to practice the invention. Thus, 
sequence homology as low as about 80% can be used. A 
practical lower limit to sequence homology can be defined 
functionally as that amount of homology which if further 
reduced does not mediate homologous integration of the 
PNS vector into the genome. Although as few as 25 bp of 
100% homology are required for homologous recombina- 
tion in mammalian cells (Ayares, et al. (1986), Genetics, 83, 
5199-5203), longer regions are preferred, e.g., 500 bp, more 
preferably, 5000 bp, and most preferably, 25000 bp for each 
homologous portion. These numbers define the limits of the 
individual lengths of the first and second sequences. Pref- 
erably, the homologous portions of the PNS vector will be 
100% homologous to the target DNA sequence, as increas- 
ing the amount of non-homology will result in a correspond- 
ing decrease in the frequency of gene targeting. If non- 
homology does exist between the homologous portion of the 
PNS vector and the appropriate region of the target DNA, it 
is preferred that the non-homology not be spread throughout 
the homologous portion but rather in discrete areas of the 
homologous portion. It is also preferred that the homologous 
portion of the PNS vector adjacent to the negative selection 
marker (fourth or fifth DNA sequence) be 100% homolo- 
gous to the corresponding region in the target DNA. This is 
to ensure maximum discontinuity between homologous and 
non-homologous sequences in the PNS vector. 

Increased frequencies of homologous recombination have 
been observed when the absolute amount of DNA sequence 
in the combined homologous portions of the first and second 
DNA sequence are increased. FIG. 4 depicts the targeting 
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frequency of the Hprt locus as a function of the extent of 
homology between an appropriate PNS vector and the 
endogenous target. A series of replacement (A) and insertion 
(•) Hprt vectors were constructed that varied in the extent 
of homology to the endogenous Hprt gene. Hprt sequences s 
in each vector were interrupted in the eighth exon with the 
neomycin resistance gene. The amount of Hprt sequence 3' 
to the neogene was kept constant to the amount of Hprt 
sequence 5' to the neo was varied. The absolute frequency of 
independent targeting events per total ES cells electropo- 10 
rated is plotted in FIG. 4 on the logarithmic scale as a 
function of the number of kilobases of Hprt sequence 
contained within the PNS vectors. See Capecchi, M. R. 
(1989), Science, 244, 1288-1292. 

As previously indicated, the fourth DNA sequence con- 15 
taining the negative selection marker should have sufficient 
non-homology to the target DNA sequence to prevent 
homologous recombination between the fourth DNA 
sequence and the target DNA. This is generally not a 
problem since it is unlikely that the negative selection 20 
marker chosen will have any substantial homology to the 
target DNA sequence. In any event, the sequence homology 
between the fourth DNA sequence and the target DNA 
sequence should be less than about 50%, most preferably 
less than about 30%. 25 

A preliminary assay for sufficient sequence non-homol- 
ogy between the fourth DNA sequence and the target DNA 
sequence utilizes standard hybridization techniques. For 
example, the particular negative selection marker may be 
appropriately labeled with a radioisotope or other detectable 30 
marker and used as a probe in a Southern blot analysis of the 
genomic DNA of the target cell..' If little or no signal is 
detected under intermediate stringency conditions such as 
3XSSC when hybridized at about 55° C, that negative 
selection marker should be functional in a PNS vector 35 
designed for homologous recombination in that cell type. 
However, even if a signal is detected, it is not necessarily 
indicative that particular negative selection cannot be used 
in a PNS vector targeted for that genome. This is because the 
negative selection marker may be hybridizing with a region 40 
of the genome which is not in proximity with the target DNA 
sequence. Since the target DNA sequence is defined as those 
DNA sequences corresponding to first, second, third, and in 
some cases, fourth regions of the genome, Southern blots 
localizing the regions of the target DNA sequence may be 45 
performed. If the probe corresponding to the particular 
negative selection marker does not hybridize to these bands, 
it should be functional for PNS vectors directed to these 
regions of the genome. 

Hybridization between sequences encoding the negative 50 
selection marker and the genome or target regions of a 
genome, however, does not necessarily mean that such a 
negative selection marker will not function in a PNS vector. 
The hybridization assay is designed to detect those 
sequences which should function in the PNS vector because 55 
of their failure to hybridize to the target. Ultimately, a DNA 
sequence encoding a negative selection marker is functional 
in a PNS vector if it is not integrated during homologous 
recombination regardless of whether or not it hybridizes 
with the target DNA. 60 

It is also possible that high stringency hybridization can 
be used to ascertain whether genes from one species can be 
targeted into related genes in a different species. For 
example, preliminary gene therapy experiments may require 
that human genomic sequences replace the corresponding 
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related genomic sequence in mouse cells. High stringency 
hybridization conditions such as 0.1 XSSC at about 68° C. 
can be used to correlate hybridization signal under such 
conditions with the ability of such sequences to act as 
homologous portions in the first and second DNA sequence 
of the PNS vector. Such experiments can be routinely 
performed with various genomic sequences having known 
differences in homology. The measure of hybridization may 
therefore correlate with the ability of such sequences to 
bring about acceptable frequencies of recombination. 

Table I identifies various positive and negative selection 
markers which may be used respectively in the third and 
fourth DNA sequences of the PNS vector together with the 
conditions used to select for or against cells expressing each 
of the selection markers. As for animal cells such as mouse 
L cells, ES cells, preferred positive selection markers 
include DNA sequences encoding neomycin resistance and 
hygromycin resistance, most preferably neomycin resis- 
tance. For plant cells preferred positive selection markers 
include neomycin resistance and bleomycin resistance, most 
preferably neomycin resistance. 

For animal cells, preferred negative selection markers 
include gpt and HSV-tk, most preferably HSV-tk. For plant 
cells, preferred negative selection markers include Gpt and 
HSV-tk. As genes responsible for bacterial and fungal patho- 
genesis in plants are cloned, other negative markers will 
become readily available. 

As used herein, a "positive screening marker" refers to a 
DNA sequence used in a phage rescue screening method to 
detect homologous recombination. An example of such a 
positive screening marker is the supF gene which encodes a 
tyrosine transfer RNA which is capable of suppressing 
amber mutations. See Smithies, et al. (1985), Nature, 317, 
230-234. 

The following is presented by way of example and is not 
to be construed as a limitation on the scope of the invention. 

EXAMPLE 1 



Inactivation at the int-2 locus in mouse ES cells 
1. PNS Vector Construction 

The PNS vector, pINT-2-N/TK, is described in Mansour, 
et al. (1988), Nature, 336, 349. This vector was used to 
disrupt the proto-oncogene, INT-2, in mouse ES cells. As 
shown in FIG. 5c, it contains DNA sequences 1 and 2 
homologous to the target INT-2 genomic sequences in 
mouse ES cells. These homologous sequences were 
obtained from aplasmid referred to as pAT-153 (Peters, et al. 

(1983) , Cell, 33, 369). DNA sequence 3, the positive selec- 
tion moiety of the PNS vector was the Neogene from the 
plasmid pMCTNeo described in Thomas, et al. (1987), Cell, 
51, 503; DNA sequence 4, the negative selection element of 
the vector, was the HSV-TK gene derived from the plasmid 
pIC-19-R/TK which is widely available in the scientific 
community. 

Plasmid pIC19R/MCl-TK (FIG. 5d) contains the HSV- 
TK gene engineered for expression in ES cells (Mansour, et 
al. (1988), Nature, 336, 348-352). The TK gene, flanked by 
a duplication of a mutant polyoma virus enhancer, PYF441, 
has been inserted into the vector, pIC19R (Marsh, et al. 

(1984) , Gene, 32, 481-485) between the Xhol and the 
Hindm sites. The map of plasmid pIC19R/MCl-TK is 
shown in FIG. Sd. The enhancer sequence is as follows: 
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CTCGAGCAGTGTGGTTTTCAAGAGGAAGCAAAAAGCCTCTCCACCCAGGC 
CTGGAATGTTTCCACCCAATGTCGAGCAGTGTGGTrTTGCAAGAGGAAGC 
AAAAAGCCTC TCCACCCAGG CCTGGAATGT TTCCACCCAA TGTCGAG 



The 5' end is an Xhol restriction enzyme site, the 3' end 
is contiguous with the HSV-TK gene. The HSV-TK io 
sequences are from nucleotides 92-1799 (McKnight (1980), 
Nucl. Acids. Res., 8, 5949-5964) followed at the 3' end by 
a Hindm linker. The plasmid pIC19R is essentially identical 
to the pUC vectors, with an alternative poly-linker as shown 
in FIG. Sd. 

Construction of the vector, pINT-2-N/TK involved five 
sequential steps as depicted in FIG. 5. First, a 3,965 bp PstI 
fragment containing exon lb, was excised from pAT153 and 
inserted into the PstI site of Bluescribe® (Stratagene of 
LaJolla, Calif.), an Amp* bacterial plasmid containing a , 
multi-enzyme, cloning polylinker. Second, a synthetic Xhol 
linker of r 



GCTCGAGCGGCC 

1 1 1 1 1 1 1 1 

CCGGCGAGCTCG 



was inserted into the Apal site on exon lb. Third, the 
Xhol-Sall Neo r -fragment from pMCI Neo was inserted into 30 
the Xhol linker in exon lb. Fourth, the 3,965 bp INT-2 Pst 
fragment containing the Neo r gene was reinserted into 
pAT153, to generate the plasmid pLNT-2-N as shown in FIG. 
5b. This plasmid also includes the third exon of the int-2 
gene. Fifth, the Clal-Hindll HSV-tk fragment from pIC-19- 35 
R/TK was inserted into Clal-Hindll digested pINT2-N, 
creating the final product, pINT2-N/TK. This vector was 
linearized by digestion with Clal prior to its introduction 
into ES cells. 

2. Generation of ES Cells 40 
ES cells were derived from two sources. The first source 

was isolation directly from C57B 1 /6 blastocysts (Evans, et al. 
(1981), Nature, 292, 154-156) except that primary embry- 
onic fibroblasts (Doetschman, et al. (1985), J. Embryol. Exp. 
Morphol, 87, 27-45) were used as feeders rather than STO 45 
cells. Briefly, 2.5 days postpregnancy mice were ovariecto- 
mized, and delayed blastocysts were recovered 4-6 days 
later. The blastocysts were cultured on mitomycin C-inac- 
tivated primary embryonic fibroblasts. After blastocyst 
attachment and the outgrowth of the trophectoderm, the 50 
ICM-derived clump was picked and dispersed by trypsin 
into clumps of 3-4 cells and put onto new feeders. All 
culturing was carried out in DMEM plus 20% FCS and 
lO^M P-mercaptoethanol. The cultures were examined 
daily. After 6-7 days in culture, colonies that still resembled 55 
ES cells were picked, dispersed into single cells, and 
replated on feeders. Those cell lines that retained the mor- 
phology and growth characteristic of ES cells were tested for 
pluripotency in vitro. These cell lines were maintained on 
feeders and transferred every 2-3 days. 60 

The second method was to utilize one of a number of ES 
cell lines isolated from other laboratories, e.g., CC1.2 
described by Kuehn, et al. (1987), Nature, 326, 295. The 
cells were grown on mitomycin C-inactivated STO cells. 
Cells from both sources behaved identically in gene target- 65 
ing experiments. 

3. Introduction of PNS Vector pINT-2-NATK into ES cells 



The PNS vector pINT-2-N/TK was introduced into ES 
cells by electroporation using the Promega Biotech X-Cell 
2000. Rapidly growing cells were trypsinized, washed in 
DMEM, counted and resuspended in buffer containing 20 
mM HEPES (pH 7.0), 137 mM NaCl, 5 mM KC1, 0.7 mM 
NaaHPO^ 6 mM dextrose, and 0.1 mM p-mercaptoethanol. 
Just prior to electroporation, the linearized recombinant 
vector was added. Approximately 25 ug of linearized PNS 
vector was mixed with 10 7 ES cells in each 1 ml-cuvette. 

Cells and DNA were exposed to two sequential 625 V/cm 
pulses at room temperature, allowed to remain in the buffer 
for 10 minutes, then plated in non-selective media onto 
feeder cells. 

4. Selection of ES Cells Containing a Targeted Disruption 
of the int-2 Locus 

Following two days of non-selective growth, the cells 
were trypsinized and replated onto G418 (250 ug/ml) media. 
The positive-selection was applied alone for three days, at 
which time the cells were again trypsinized and replated in 
the presence of G418 and either gancyclovir (2xl0" 6 M) 
(Syntex, Palo Alto, Calif.) or l-(2-deoxy-2-fluoro-P-D-ara- 
bino-furanosyl- 5-iodouracil (F.I.A.U.) (lxlO^M) (Bristol 
Myers). When the cells had grown to confluency, each plate 
of cells was divided into two aliquots, one of which was 
frozen in liquid N 2 , the other harvested for DNA analysis. 

5. Formation of INT-2 disrupted transgenic mice 
Those transformed cells determined to be appropriately 

modified by the PNS vector were grown in non-selective 
media for 2-5 days prior to injection into blastocysts accord- 
ing to the method of Bradley in Teratocarcinomas and 
embryonic stem cells, a practical approach, edited by E. J. 
Robertson, IRL Press, Oxford (1987), p. 125. 

Blastocysts containing the targeted ES cells were 
implanted into pseudo-pregnant females and allowed to 
develop to term. Chimaeric offspring were identified by 
coat-color markers and those males showing chimaerism 
were selected for breeding offspring. Those offspring which 
carry the mutant allele can be identified by coat color, and 
the presence of the mutant allele reaffirmed by DNA analysis 
by tail-blot, DNA analysis. 

EXAMPLE 2 

Disruption at the hoxl.4 locus in mouse ES cells 

Disruption of the hoxl.4 locus was performed by methods 
similar to those described to disrupt the int-2 locus. There 
were two major differences between these two disruption 
strategies. First, the PNS vector, pHOX1.4N/TK-TK2 (FIG. 
6), used to disrupt the hoxl.4 locus contained two negative 
selection markers, i.e., a DNA sequence 5 encoding a second 
negative selection marker was included on the PNS vector at 
the end opposite to DNA sequence 4 encoding the first 
negative selection marker. DNA sequence 5 contained the tk 
gene isolated from HSV-type 2. It functioned as a negative- 
selectable marker by the same method as the original 
HSV-tk gene, but the two tk genes are 20% non-homolo- 
gous. This non-homology further inhibits recombination 
between DNA sequences 4 and 5 in the vector which might 
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have inhibited gene-targeting. The second difference 
between the int-2 and the hoxl.4 disruption strategies is that 
the vector pHOX1.4N/TK-TK2 contains a deletion of 1000 
bp of hoxl.4 sequences internal to the gene, i.e., DNA 
sequences 1 and 2 are not contiguous. 5 

The HSV-tk2 sequences used in this construction were 
obtained from pDG504 (Swain, M. A. et al. (1983), J. Virol, 
46, 1045). The structural TK gene from pDG504 was 
inserted adjacent to the same promoter/enhancer sequences 
used to express both the Neo and HSV-tk genes, to generate 10 
the plasmid pIC20H/TK2. 

Construction of pHOX1.4N/TK-TK2 proceeded in five 
sequential steps as depicted in FIG. 6. First a clone contain- 
ing hoxl.4 sequences was isolated from a genomic X library. 
The X library was constructed by inserting EcoRI partially 15 
digested mouse DNA into the X-D ASH® (Stratagene) clon- 
ing phage. The hoxl.4 containing phage were identified by 
virtue of their homology to a synthetic oligonucleotide 
synthesized from the published sequence of the hoxl.4 locus. 
Toumier-Lasserve, et al. (1989), Mol Cell Biol, 9, 2273. 20 
Second, a 9 kb Sall-Spel fragment containing the hoxl.4 
homeodomain was inserted into Bluescribe®. Third, a 1 kb 
Bgin fragment within the hoxl.4 locus was replaced with the 
Neo r gene isolated from pMCl Neo, creating the plasmid 
pHOXl .4N. Fourth, the Xhol-Sall fragment by HSV-tk from 25 
pIC19R/TK was inserted into the Sail site of pHOX1.4N, 
generating the plasmid pHOX1.4N/TK. Fifth, the Sall-Spel 
fragment frompHOX1.4N/TK was inserted into a Sall-Xbal 
digest of the plasmid pIC20HTK2, generating the final 
product, pHOX1.4N/TK/TK2. This vector was digested 30 
with Sail to form a linear PNS vector which was transfected 
into mouse ES cells as described in Example 1. Positive- 
negative selection and the method of forming transgenic 
mice was also as described in Example 1. Southern blots of 
somatic cells demonstrate that the disrupted hoxl.4 gene was 35 
transferred to transgenic offspring. 

EXAMPLE 3 

40 

Inactivation of Other Hox Genes 

The methods described in Examples 1 and 2 have also 
been used to disrupt the hoxl.3, hoxl.6, hox2.3, and int-1 loci 
in ES cells. The genomic sequences for each of these loci 45 
(isolated from the same -Dash library containing the hoxl.4 
clone) were used to construct PNS vectors to target disrup- 
tion of these genes. All of these PNS vectors contain the 
Neo-gene from pMCi-Neo as the positive selection marker 
and the HSV-tk and HSV-tk2 sequences as negative selec- 50 
tion markers. 



TABLE V 



Other Murine Devel 



Inactivated by PNS 



hoxl.3 Ukb Xba-Hindm 



hoxl.6 13kb partial RI 
EMBO, 6, 2977 

hox2.3 12kb BamHI 

Genomics, 1, 182 

int-1 13kb Bgin 



Tournier-Iasserve, 

et al. (1989), homeo-domain 
MCE, 9, 2273 

Baron, et al. (1987), Bgffl-site in 



Hart, et al. (1987), BglD-site in 
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EXAMPLE 4 
Vascular Graft Supplementing Factor VHI 

In this example, a functional factor VHI gene is targeted 
by a PNS vector to the P-actin locus in human endothelial 
cells. When so incorporated, the expression of factor VHI is 
controlled by the p-actin promoter, a promoter known to 
function in nearly all somatic cells, including fibroblasts, 
epithelial and endothelial cells. PNS vector construction is 
as follows: In step IA (FIG. 7A), the 13.8 kb EcoRI fragment 
containing the entire human P-actin gene from the A-phage, 
14TB (Leavitte, et al. (1984), Mol Cell Bio., 4, 1961) is 
inserted, using synthetic Ecorl/Xhol adaptors, into the Xhol 
site of the TK vector, pIC-19-R/TK to form plasmid pBact/ 
TK. See FIG. 7A. 

In step IB (FIG. 7B), the 7.2 kb Sail fragment from a 
factor Vin cDNA clone including its native signal sequence 
(Kaufman, et al. (1988), JBC, 263, 6352; Toole, et al. 
(1986), Proc. Natl. Acad. Sci. U.S.A., 83, 5939) is inserted 
next to the Neo r gene in a pMCI derivative plasmid. This 
places the neo r gene (containing its own promoter/enhancer) 
3' to the polyadenylation site of factor VTA. This plasmid is 
designated pFVin/Neo. 

In step 2 (FIG. 7C), the factor Vm/Neo fragment is 
excised with Xhol as a single piece and inserted using 
synthetic Xhol/Ncol adaptors at the Ncol site encompassing 
the met-initiation codon in pBact/TK. This codon lies in the 
2nd exon of the p-actin gene, well away from the promoter, 
such that transcription and splicing of the mRNA is in the 
normal fashion. The vector so formed is designated pBact/ 
FVm/Neo/TK. 

This vector is digested with either Clal or Hmdrfl which 
acts in the polylinker adjacent to the TK gene. The linker 
vector is then introduced by electroporation into endothelial 
cells isolated from a hemophiliac patient. The cells are then 
selected for G418 and gancyclovir resistance. Those cells 
shown by DNA analysis to contain the factor VDJ gene 
targeted to the P-actin locus or cells shown to express FVHI 
are then seeded into a vascular graft which is subsequently 
implanted into the patient's vascular system. 

EXAMPLE 5 



Replacement of a mutant PNP gene in human bone 
marrow stem cells using PNS 

The genomic clone of a normal purine nucleoside phos- 
phonylase (PNP) gene, available as a 12.4 kb, Xba-partial 
fragment (Williams, et al. (1984), Nucl. Acids Res, 12, 5779; 
Williams, etal. (1987), J: Biol. Chem., 262, 2332) is inserted 
at the Xbal site in the vector, pIC-19-R/TK. The neo r gene 
from pMCI-Neo is inserted, using synthetic BamHI/XhoI 
linkers, into the BamHI site in intron 1 of the PNP gene. The 
linearized version of this vector (cut with Clal) is illustrated 
in FIG. 8. 

Bone marrow stem cells from PNP patients transfected 
with this vector are selected for neo r , gan r , in culture, and 
those cells exhibiting replacement of the mutant gene with 
the vector gene are transplanted into the patient. 

EXAMPLE 6 

Inactivation by insertional mutagenesis of the Hox 
1.1 locus in mouse ES cells, using a promoterless 
PNS vector 

A promoterless positive selection marker is obtained 
using the Neo R gene, excised at its 5' end by enzyme, EcoRI, 
from the plasmid, pMCI-Neo. Such a digestion re 
Neo structural gene from its controlling elements. 
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A promoterless PNS vector is used to insert the Neo gene 
into the Hox Vi gene in ES cells. The Hox 1.1 gene is 
expressed in cultured embryo cells (Colberg-Poley, et al. 
(1985), Nature, 314, 713) and the site of insertion, the 
second exon, lies 3' to the promoter of the gene (Kessel, et 5 
al. (1987), PNAS, 84, 5306; Zimmer, et al. (1989), Nature, 
338, 150). Expression of Neo will thus be dependent upon 
insertion at the Hox 1.1 locus. 

Vector construction is as follows: 

Step 1 — The neo gene, missing the transcriptional control 10 
sequences is removed from pMCI-Neo, and inserted into the 
second exon of the 11 kb, Fspl-Kpnl fragment of Hox 1.1 
(Kessel, et al. (1987), supra; Zimmer, et al. (1989), supra). 

Step 2 — The Hox 1.1 -Neo sequences is then inserted , 
adjacent to the HSV-tk gene is pIC19R/TK, creating the 
targeting vector, pHoxl.l-N/TK. The linearized version of 
this vector is shown in FIG. 9 This vector is electroporated 
into ES cells, which are then selected for Neo r , GanC. The 
majority of cells surviving this selection are predicted to 2Q 
contain targeted insertions of Neo at the Hoxl.l locus. 

EXAMPLE 7 

Inducible promoters 25 

PNS vectors are used to insert novel control elements, for 
example inducible promoters, into specific genetic loci. This 
permits the induction of specified proteins under the spatial 
and/or temporal control of the investigator. In this example, 
the MT-1 promoter is inserted by PNS into the Int-2 gene in 30 
mouse ES cells. 

The inducible promoter from the mouse metallothionein-I 
(MT-I) locus is targeted to the Int-2 locus. Mice generated 
from ES cells containing this alteration have an Int-2 gene 
inducible by the presence of heavy metals. The expression of 3S 
this gene in mammary cells is predicted to result in onco- 
genesis and provides an opportunity to observe the induction 
of the disease. 

Vector construction is as follows: 4Q 

Step 1— The Ecorl-Bgin fragment from the MT-I gene 
(Palmiter, et al. (1982), Cell, 29, 701) is inserted by blunt- 
end ligation into the BSSHH site, 5' to the Int-2 structural 
gene in the plasmid, pAT 1 53 (see discussion of Example 1). 

Step 2— The MCI-Neo gene is inserted into the AvrE site 45 
in intron 2 of the Int-2-MT-I construct. 

Step 3— The int-2-MT-ILNeo fragment is inserted into 
the vector, pIC 19R7TK, resulting in the construct shown in 
FIG. 10. 

Introduction of this gene into mouse ES cells by elec- 50 
troporation, followed by Neo r , GanC, selection results in 
cells containing the MT-I promoter inserted 5' to the Int-2 
gene. These cells are then inserted into mouse blastocysts to 
generate mice carrying this particular allele. ^ 

EXAMPLE 8 

Inactivation of the ALS-II gene in tobacco 
protoplasts by PNS 

A number of herbicides function by targeting specific 
plant metabolic enzymes. Mutant alleles of the genes encod- 
ing these enzymes have been identified which confer resis- 
tance to specific herbicides. Protoplasts containing these 
mutant alleles have been isolated in culture and grown to 65 
mature plants which retain the resistant phenotype (Botter- 
man, et al. (1988), TIGS, 4, 219; Gasser, et al. (1989), 



Science, 244, 1293). One problem with this technology is 
that the enzymes involved are often active in multimer form, 
and are coded by more than one genetic locus. Thus, plants 
containing a normal (sensitive) allele at one locus and a 
resistant allele at another locus produce enzymes with mixed 
subunits which show unpredictable resistance characteris- 

In this example, the gene product of the ALS genes 
(acetolactate synthase) is the target for both sulfonylurea and 
imidazolinone herbicides (Lee, et al. (1987), EMBO, 7, 
1241). Protoplasts resistant to these herbicides have been 
isolated and shown to contain mutations in one of the two 
ALS loci. A 10 kb Spel fragment of the ALS-II gene (Lee, 
et al. (1988), supra; Mazur, et al. (1987), Plant Phys., 85, 
1110) is subcloned into the negative selection vector, pIC- 
19R/TK. A neo r gene, engineered for expression in plant 
cells with regulating sequences from the mannopine syn- 
thase gene for the TI plasmid is inserted into the EcoRI site 
in the coding region of the ALS-II. This PNS vector is 
transferred to the C3 tobacco cell line (Chalef, et al. (1984), 
Science, 223, 1 148), carrying a chlorsulfuron r allele in Als-I. 

They are then selected for Neo r , GanC. Those cells 
surviving selection are screened by DNA blots for candi- 
dates containing insertions in the ALS-II gene. 

Having described the preferred embodiments of the 
present invention, it will appear to those ordinarily skilled in 
the art that various modifications may be made to the 
disclosed embodiments, and that such modifications are 
intended to be within the scope of the present invention. 

What is claimed is: 

1. A positive-negative selection (PNS) vector for modi- 
fying a target DNA sequence contained in the genomes of 
murine embryonic stem cells, said PNS vector comprising: 

a first homologous vector DNA sequence capable of 

homologous recombination with a first region of said 

target DNA sequence, 
a positive selection marker DNA sequence capable of 

conferring a positive selection characteristic in said 

cells, 

a second homologous vector DNA sequence capable of 
homologous recombination with a second region of 
said target DNA sequence, and 

a negative selection marker DNA sequence, capable of 
conferring a negative selection characteristic in said 
cells, thereby allowing killing of said cells, but sub- 
stantially incapable of homologous recombination with 
said target DNA sequence, 

wherein the spatial order of said sequences in said PNS 
vector is: said first homologous vector DNA sequence, 
said positive selection marker DNA sequence, said 
second homologous vector DNA sequence and said 
negative selection marker DNA sequence as shown in 
FIG. 1, 

wherein the 5'-3' orientation of said first homologous 
vector sequence relative to said second homologous 
vector sequence is the same as the 5-3' orientation of 
said first region relative to said second region of said 
target sequence; 

wherein the vector is capable of modifying said target 
DNA sequence by homologous recombination of said 
first homologous vector DNA sequence with said first 
region of said target sequence and of said second 
homologous vector DNA sequence with said second 
region of said target sequence. 

2. The PNS vector of claim 1 wherein said target DNA 
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contains exons and introns and said positive selection 
marker DNA sequence further contains the exon-intron and 
intron-exon splicing sequences for an intron in said target 
DNA. 

3. The PNS vector of claim 2 wherein said first or said 5 
second homologous vector DNA sequence contains at least 

a portion of an exon wherein one or more nucleotides have 
been substituted, deleted or inserted. 

4. The PNS vector of claim 1 wherein said target DNA 
sequence contains exons and introns and said first and i 0 
second homologous vector DNA sequences contain different 
portions of the same exon of said target DNA sequence. 

5. The PNS vector of claim 1 wherein said PNS vector has 
a length between 20 kb and 50 kb. 

6. The PNS vector of claim 1 wherein said first and said 15 
second homologous vector DNA sequences have a length 
between 25 base pairs and 50,000 base pairs each. 

7. The PNS vector of claim 1 wherein said first and said 
second homologous vector DNA sequences have a length 
between 1,000 base pairs and 15,000 base pairs each. 2 o 

8. The PNS vector of claim 1 wherein said positive 
selection marker DNA sequence is selected from the group 
consisting of DNA sequences encoding neomycin resis- 
tance, hygromycin resistance, histidinol resistance, xanthine 
utilization and bleomycin resistance. 25 

9. The PNS vector of claim 8 wherein said positive 
selection marker is a DNA sequence encoding neomycin 



10. The PNS vector of claim 1 wherein said negative 
selection marker DNA sequence is selected from the group 30 
consisting of DNA sequences encoding Hprt, gpt, HSV-tk, 
diphtheria toxin, ricin toxin and cytosine deaminase. 

11. The PNS vector of claim 1 wherein said negative 
selection marker is a DNA sequence encoding HSV-tk. 

12. The PNS vector of claim 1 wherein said first or said 35 
second homologous vector DNA sequence comprises a DNA 
sequence having a modification of said target DNA 
sequence. 

13. The PNS vector of claim 12 wherein said modification 

is an insertion of one or more nucleotides. 40 

14. The PNS vector of claim 1 wherein said first or said 
second homologous vector DNA sequence encodes the 
correction of a genetic defect in said target DNA sequence. 

15. The PNS vector of claim 14 wherein said genetic 
defect in said target DNA comprises the insertion of one or 45 
more nucleotides in said target DNA sequence. 

16. The PNS vector of claim 15 wherein said genetic 
defect is associated with hemoglobinopathies, deficiencies 
in circulatory factors, intracellular enzymes or extracellular 
enzymes. 50 

17. The PNS vector of claim 1 wherein said PNS vector 
is linear. 

18. The PNS vector of claim 1 wherein said PNS vector 
is closed circular. 

19. A method for enriching for a transformed murine 55 
embryonic stem cell containing a modification in a target 
DNA sequence in the genome of said cell comprising: 

(a) transfecting cells capable of mediating homologous 
recombination with a positive-negative selection vector 
comprising: 60 
a first homologous vector DNA sequence capable of 

homologous recombination with a first region of said 

target DNA sequence, 
a positive selection marker DNA sequence capable of 

conferring a positive selection characteristic in said 65 

cells, 

a second homologous vector DNA sequence capable of 



homologous recombination with a second region of 
said target DNA sequence, and 
a negative selection marker DNA sequence, capable of 
conferring a negative selection characteristic in said 
cells, thereby allowing killing of said cells but sub- 
stantially incapable of homologous recombination 
with said target DNA sequence, 
wherein the spatial order of said sequences in said PNS 
vector is: said first homologous vector DNA 
sequence, said positive selection marker DNA 
sequence, said second homologous vector DNA 
sequence and said negative selection marker DNA 
sequence as shown in FIG. 1, 
wherein the 5'-3' orientation of said first homologous 
vector sequence relative to said second homologous 
vector sequence is the same as the 5'-3' orientation of 
said first region relative to said second region of said 
target sequence; 
wherein the vector is capable of modifying the target 
DNA sequence by homologous recombination of 
said first and second homologous vector sequences 
with the first and second regions of said target 
sequence; 

(b) selecting for transformed cells in which said positive- 
negative selection vector has integrated into said target 
DNA sequence by homologous recombination by 
sequentially or simultaneously selecting against trans- 
formed cells containing said negative selection marker 
and selecting for cells containing said positive selection 
marker; and 

(c) analyzing the DNA of transformed cells surviving the 
selecting step to identify a cell containing the modifi- 

20. The method of claim 19 wherein said target DNA 
contains exons and introns and said positive selection 
marker DNA sequence further contains the exon-intron and 
intron-exon splice sequences for an intron in said target 
DNA sequence. 

21. The method of claim 19 wherein said first or said 
second homologous vector DNA sequence contains at least 
a portion of an exon of said target DNA sequence wherein 
one or more nucleotides of said target sequence have been 
substituted, deleted or inserted. 

22. The method of claim 19 wherein said target DNA 
sequence contains exons and introns and said first and 
second homologous vector DNA sequences contain different 
portions of the same exon of said target DNA sequence. 

23. The method of claim 19 wherein said PNS vector has 
a length between 20 kb and 50 kb. 

24. The method of claim 19 wherein said first and said 
second homologous vector DNA sequences have a length 
between 1,000 base pairs and 15,000 base pairs each. 

25. The method of claim 19 wherein said positive selec- 
tion marker DNA sequence is selected from the group 
consisting of DNA sequences encoding neomycin resis- 
tance, hygromycin resistance, histidinol resistance, xanthine 
utilization and bleomycin resistance. 

26. The method of claim 25 wherein said positive selec- 
tion marker is a DNA sequence encoding neomycin resis- 
tance.. 

27. The method of claim 19 wherein said negative selec- 
tion marker DNA sequence is selected from the group 
consisting of DNA sequences encoding Hprt, gpt, HSV-tk, 
diphtheria toxin, ricin toxin or cytosine deaminase. 

28. The method of claim 27 wherein said negative selec- 
tion marker is a DNA sequence encoding HSV-tk. 

29. The method of claim 19 wherein said first or said 
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second homologous vector DNA sequence in said PNS 
vector further comprises a DNA sequence having a modi- 
fication of said target DNA sequence. 

30. The method of claim 29 wherein said modification is 

a substitution, insertion or deletion of one or more nncle- 5 
otides. 

31. The PNS method of claim 19 wherein said first or said 
second homologous vector DNA sequences in said PNS 
vector encodes the correction of a genetic defect in said 
target DNA. 10 

32. The method of claim 31 wherein said genetic defect in 
said target DNA comprises the insertion of one or more 
nucleotides in said target DNA sequence. 

33. The method of claim 32 wherein said genetic defect is 
associated with hemoglobinopathies, deficiencies in circu- is 
latory factors, extracellular enzymes or intracellular 



34. The method of claim 19 wherein said PNS vector is 
linear. 

35. The method of claim 19 wherein said PNS vector is 20 
closed circular. 

36. The PNS vector of claim 1, wherein said target DNA 
sequence is a gene. 

37. The PNS vector of claim 1, wherein said target DNA 
sequence is a regulatory sequence. 
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38. The PNS vector of claim 12, wherein said modifica- 
tion is a deletion of one or more nucleotides in said target 
DNA sequence. 

39. The PNS vector of claim 14, wherein said genetic 
defect in said target DNA comprises the deletion of one or 
more nucleotides in said target DNA sequence. 

40. The PNS vector of claim 12, wherein said modifica- 
tion is a substitution of one or more nucleotides in said target 
DNA sequence. 

41. The PNS vector of claim 14, wherein said genetic 
defect in said target DNA comprises the substitution of one 
or more nucleotides in said target DNA sequence. 

42. The method of claim 31, wherein said genetic defect 
in said target DNA comprises the deletion of one or more 
nucleotides in said target DNA sequence. 

43. The method of claim 31, wherein said genetic defect 
in said target DNA comprises the substitution of one or more 
nucleotides in said target DNA sequence. 

44. The method of claim 19 wherein said vector is a 
sequence replacement vector and said first and second 
homologous vector DNA sequences comprise contiguous 
first and second regions in said target DNA sequence. 
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ABSTRACT 



Recombinational cloning is provided by the use of nucleic 
acids, vectors and methods, in vitro and in vivo, for moving 
or exchanging segments of DNA molecules using engi- 
neered recombination sites and recombination proteins to 
provide chimeric DNA molecules that have the desired 
characteristic(s) and/or DNA segment(s). 
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RECOM BIN ATION AL CLONING USING 
ENGINEERED RECOMBINATION SITES 

CROSS-REFERENCE TO RELATED 
APPLICATIONS 
The present application is a continuation-in-part of U.S. 
application Ser. No. 08/486,139, filed Jun. 7, 1995, now 
abandoned which application is entirely incorporated herein 
by reference. 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to recombinant DNA tech- 
nology. DNA and vectors having engineered recombination 
sites are provided for use in a recombinational cloning 
method that enables efficient and specific recombination of 
DNA segments using recombination proteins. The DNAs, 
vectors and methods are useful for a variety of DNA 
exchanges, such as subcloning of DNA, in vitro or in vivo. 

2. Related Art 

Site specific recombinases. Site specific recombinases are 
enzymes that are present in some viruses and bacteria and 
have been characterized to have both endonuclease and 
ligase properties. These recombinases (along with associ- 
ated proteins in some cases) recognize specific sequences of 
bases in DNA and exchange the DNA segments flanking 
those segments. The recombinases and associated proteins 
are collectively referred to as "recombination proteins" (see, 
e.g., Landy, A., Current Opinion in Biotechnology 
3:699-707 (1993)). 

Numerous recombination systems from various organ- 
isms have been described. See, e.g., Hoess et al., Nucleic 
Acids Research 14(6):2287 (1986); Abremski et al., J. Biol. 
Chem. 261(1):391 (1986); Campbell, J. Bacteriol. 174(23) 
:7495 (1992); Qian et al, J. Biol. Chem. 267(1 1):7794 
(1992); Araki et al., J. Mol. Biol. 225(1):25 (1992); Maeser 
and Kahnmann (1991) Mol. Gen. Genet. 230:170-176). 

Many of these belong to the integrase family of recom- 
binases (Argos et al. EMBOJ. 5:433-140 (1986)). Perhaps 
the best studied of these are the Integrase/att system from 
bacteriophage ~k (Landy, A. Current Opinions in Genetics 
and Bevel. 3:699-707 (1993)), the Cre/loxP system from 
bacteriophage PI (Hoess and Abremski (1990) In Nucleic 
Acids and Molecular Biology, vol. 4. Eds.: Eckstein and 
Lilley, Berlin-Heidelberg: Springer- Verlag; pp. 90-109), 
and the FLP/FRT system from the Saccharomyces cerevisiae 
2 ft circle plasmid (Broach et al. Cell 29:227-234 (1982)). 

While these recombination systems have been character- 
ized for particular organisms, the related art has only taught 
using recombinant DNA flanked by recombination sites, for 
in viva recombination. 

Backman (U.S. Pat. No. 4,673,640) discloses the in vivo 
use of X recombinase to recombine a protein producing 
DNA segment by enzymatic site-specific recombination 
using wild-type recombination sites attB and attP. 

Hasan and Szybalski (Gene 56:145-151 (1987)) discloses 
the use of X Int recombinase in vivo for intramolecular 
recombination between wild type attP and attB sites which 
flank a promoter. Because the orientations of these sites are 
inverted relative to each other, this causes an irreversible 
flipping of the promoter region relative to the gene of 

Palazzolo et al. Gene 88:25-36 (1990), discloses phage 
lambda vectors having bacteriophage X arms that contain 
restriction sites positioned outside a cloned DNA sequence 
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and between wild-type loxP sites. Infection of E. coli cells 
that express the Cre recombinase with these phage vectors 
results in recombination between the loxP sites and the in 
vivo excision of the plasmid replicon, including the cloned 
5 cDNA. 

Posfai et al. (Nucl. Acids Res. 22:2392-2398 (1994)) 
discloses a method for inserting into genomic DNA partial 
expression vectors having a selectable marker, flanked by 
two wild-type FRT recognition sequences. FLP site-specific 

10 recombinase as present in the cells is used to integrate the 
vectors into the genome at predetermined sites. Under 
conditions where the replicon is functional, this cloned 
genomic DNA can be amplified. 

Bebee et al. (U.S. Pat. No. 5,434,066) discloses the use of 

15 site-specific recombinases such as Cre for DNA containing 
two loxP sites is used for in vivo recombination between the 
sites. 

Boyd (Nucl. Acids Res. 21:817-821 (1993)) discloses a 

2Q method to facilitate the cloning of blunt-ended DNA using 
conditions that encourage intermolecular ligation to a 
dephosphorylated vector that contains a wild-type loxP site 
acted upon by a Cre site -specific recombinase present in E. 
coli host cells. 

Waterhouse et al. (PCT No. 93/19172 and Nucleic Acids 
Res. 21 (9):2265 (1993)) disclose an in vivo method where 
light and heavy chains of a particular antibody were cloned 
in different phage vectors between loxP and loxP 511 sites 
and used to transfect new£. coli cells. Cre, acting in the host 

30 cells on the two parental molecules (one plasmid, one 
phage), produced four products in equilibrium: two different 
cointegrates (produced by recombination at either loxP or 
loxP 511 sites), and two daughter molecules, one of which 
was the desired product. 

35 In contrast to the other related art, Schlake & Bode 
(Biochemistry 33:12746-12751 (1994)) discloses an in vivo 
method to exchange expression cassettes at defined chro- 
mosomal locations, each flanked by a wild type and a 
spacer-mutated FRT recombination site. A double-reciprocal 

40 crossover was mediated in cultured mammalian cells by 
using this FLP/FRT system for site-specific recombination. 

Transposases. The family of enzymes, the transposases, 
has also been used to transfer genetic information between 
replicons. Transposons are structurally variable, being 

45 described as simple or compound, but typically encode the 
recombinase gene flanked by DNA sequences organized in 
inverted orientations. Integration of transposons can be 
random or highly specific. Representatives such as Tn7, 
which are highly site-specific, have been applied to the in 

50 vivo movement of DNA segments between replicons 
(Lucklow et al., J. Virol. 67:4566-4579 (1993)). 

Devine and Boeke Nucl. Acids Res. 22:3765-3772 
(1994), discloses the construction of artificial transposons 
for the insertion of DNA segments, in vitro, into recipient 

55 DNA molecules. The system makes use of the integrase of 
yeast TY1 virus-like particles. The DNA segment of interest 
is cloned, using standard methods, between the ends of the 
transposon-like element TY1. In the presence of the TY1 
integrase, the resulting element integrates randomly into a 

60 second target DNA molecule. 

DNA cloning. The cloning of DNA segments currently 
occurs as a daily routine in many research labs and as a 
prerequisite step in many genetic analyses. The purpose of 
these clonings is various, however, two general purposes can 

65 be considered: (1) the initial cloning of DNA from large 
DNA or RNA segments (chromosomes, YACs, PCR 
fragments, mRNA, etc.), done in a relative handful of known 
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vectors such as pUC, pGem, pBlueScript, and (2) the two or more days later the desired subclone can not be found 

subcloning of these DNA segments into specialized vectors among the candidate plasmids, the entire process must then 

for functional analysis. A great deal of time and effort is be repeated with alternative conditions attempted. Although 

expended both in the initial cloning of DNA segments and site specific recombinases have been used to recombine 

in the transfer of DNA segments from the initial cloning 5 DNA in vivo, the successful use of such enzymes in vitro 

vectors to the more specialized vectors. This transfer is was expected to suffer from several problems. For example, 

called subcloning. the Slte specificities and efficiencies were expected to differ 

m , . j- m vitro: topologically-hnked products were expected; and 

The basic methods for cloning have been known for many ^ * of ^ DNA su5strates and recombina tion 

years and have changed little during that time. A typical proteins was expected to differ significantly in vitro (see, 

cloning protocol is as follows: i° e g ^ Adams et ^ j Mo[ Biol 2 26:661-73 (1992)). Reac- 

(1) digest the DNA of interest with one or two restriction t ; ons tnat cou i d g0 on for many hours in vivo were expected 
enzymes; to occur in significantly less time in vitro before the enzymes 

(2) gel purify the DNA segment of interest when known; became inactive. Multiple DNA recombination products 

(3) prepare the vector by cutting with appropriate restric- were expected in the biological host used, resulting in 
tion enzymes, treating with alkaline phosphatase, gel unsatisfactory reliability, specificity or efficiency of subclon- 
purify etc., as appropriate; ing. In vitro recombination reactions were not expected to be 

(4) ligate the DNA segment to vector, with appropriate sufficiently efficient to yield the desired levels of product, 
controls to estimate background of uncut and self- Accordingly, there is a long felt need to provide an 
ligated vector; 20 alternative subcloning system that provides advantages over 

(5) introduce the resulting vector into an E. coli host cell; the known use of restriction enzymes and ligases. 

(6) pick selected colonies and grow small cultures over- SUMMARY OF THE INVENTION 

m § nt > The present invention provides nucleic acid, vectors and 

(7) make DNA mimpreps; and methods for obtaining chimeric nucleic acid using recom- 

(8) analyze the isolated plasmid on agarose gels (often 25 b ination pro teins and engineered recombination sites, in 
after diagnostic restriction enzyme digestions) or by vitro or in vivo Tnese me thods are highly specific, rapid, 
PCR. an d i ess labor intensive than what is disclosed or suggested 

The specialized vectors used for subcloning DNA seg- in the re i ated background art. The improved specificity, 

ments are functionally diverse. These include but are not speed and yie i ds 0 f the present invention facilitates DNA or 

limited to: vectors for expressing genes in various organ- 30 RNA subc loning, regulation or exchange useful for any 

isms; for regulating gene expression; for providing tags to related purpose. Such purposes include in vitro recombina- 

aid in protein purification or to allow tracking of proteins in tion 0 f DNA segments and in vitro or in vivo insertion or 

cells; for modifying the cloned DNA segment (e.g., gener- modification of transcribed, replicated, isolated or genomic 

ating deletions); for the synthesis of probes (e.g, DNA or RNA. 

riboprobes); for the preparation of templates for DNA 35 Represent invention relates to nucleic acids, vectors and 

sequencing; for the identification of protein coding regions; methods for moving or exchanging segments of DNA using 

for the fusion of various protein-coding regions; to provide a , ]east one engineered recombination site and at least one 

large amounts of the DNA of interest, etc. It is common that recomb i n ation protein to provide chimeric DNA molecules 

a particular investigation will involve subcloning the DNA whfch have (he desired characteristics) and/or DNA 

segment of interest into several different specialized vectors. 40 segment (s). Generally, one or more parent DNA molecules 

As known in the art, simple subclomngs can be done in are recornb ined to give one or more daughter molecules, at 

one day (e.g., the DNA segment is not large and the least Qne of which is the des i r ed Product DNA segment or 

restriction sites are compatible with those of the subcloning vector The invention thu s relates to DNA, RNA, vectors and 

vector). However, many other subclonings can take several me(hods , 0 effect the exchange and/or to select for one or 

weeks, especially those involving unknown sequences, long 45 more des: ; red products 

fragments toxic genes, unsuitable placement of restriction Qne embodiment of the t invention relates to a 

sites high backgrounds, impure enzymes, etc. Subcloning method of m chimeric whicfa rises 

DNA fragments is thus often viewed as a chore to be done , . 

as few times as possible. a) combining ,n vito or in vivo 

Several methods for facilitating the cloning of DNA 50 0) an Insert Donor molecule, comprising a desired 

segments have been described, e.g., as in the following DNA segment flanked by a first recombination site and 

references a s 60011 ^ recombination site, wherein the first and 

Ferguson, J., et al. Gene 16:191 (1981), discloses a family ^ond recombination sites do not recombine with each 

of vectors for subcloning fragments of yeast DNA. The other ; 

vectors encode kanamycin resistance. Clones of longer yeast 55 (u) a Vector Donor DNA molecule containing a third 

DNA segments can be partially digested and ligated into the recombination site and a fourth recombination site, 

subcloning vectors. If the original cloning vector conveys wherein the third and fourth recombination sites do not 

resistance to ampicillin, no purification is necessary prior to recombine with each other; and 

transformation, since the selection will be for kanamycin. (iii) one or more site specific recombination proteins 

Hashimoto-Gotoh, T., et al. Gene 41: 125 (1986), discloses 60 capable of recombining the first and third recombina- 

a subcloning vector with unique cloning sites within a tional sites and/or the second and fourth recombina- 

streptomycin sensitivity gene; in a streptomycin-resistant tional sites; 

host, only plasmids with inserts or deletions in the dominant thereby allowing recombination to occur, so as to produce 

sensitivity gene will survive streptomycin selection. at least one Cointegrate DNA molecule, at least one 

Accordingly, traditional subcloning methods, using 65 desired Product DNA molecule which comprises said 

restriction enzymes and ligase, are time consuming and desired DNA segment, and optionally a Byproduct 

relatively unreliable. Considerable labor is expended, and if DNA molecule; and then, optionally, 
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(b) selecting for the Product or Byproduct DNA molecule. plasmid (here, pEZC726), and obtaining Product DNA and 

Another embodiment of the present invention relates to a Byproduct daughter molecules. The two recombination sites 

kit comprising a carrier or receptacle being compartmental- are attP and loxP on the Vector Donor. On one segment 

ized to receive and hold therein at least one container, defined by these sites is a kanamycin resistance gene whose 

wherein a first container contains a DNA molecule compris- 5 promoter has been replaced by the tetOP operator/promoter 

ing a vector having at least two recombination sites flanking from transposon TnlO. See Sizemore et &\.,Nucl. Acids Res. 

a cloning site or a Selectable marker, as described herein. 18(10):2875 (1990). In the absence of tet repressor protein, 

The kit optionally further comprises: E. coli RNA polymerase transcribes the kanamycin resis- 

(i) a second container containing a Vector Donor plasmid ta nce gene from the tetOP. If tet repressor is present, it binds 

comprising a subcloning vector and/or a Selectable ™ to tetOP and blocks transcription of the kanamycin resis- 

marker of which one or both are flanked by one or more tance gene. The other segment of pEZC726 has the tet 

engineered recombination sites; and/or repressor gene expressed by a constitutive promoter. Thus 

... , . . ... ,. , .... cells transformed by pEZC726 are resistant to 

(n) a third container containing at least one recombination . c .u ui u • i ,i 

protein which recognizes and is capable of recombimng chloramphenicol, because of the chloramphenicol acetyl 

at least one of said recombination sites. 15 trans eras ^ S ene on lhe * ame as te ' R > b / are 

Other embodiments include DNA and vectors useful in f ensllive , 0 kanamycin. The recombinase-mediated eac- 

the methods of the present invention. In particular, Vector tons result in separation of the tetR gene from th regulated 

Donor molecules are provided in one embodiment, wherein kanamycin resistance gene. This separation results in kana- 

. ...... v . n . j uu mycm resistance in cells receiving only the desired recom- 

DNA segments within the Vector Donor are separated either , / . , a „ . , °. J ..... 

, . b ■ , ,, . r. . , u- .• 20 bination products. The first recombination reaction is driven 

by, (i) m a circular Vector Donor, at least two recombination , ., , , , ■ „ . , . „ t,, 

..' w ...s . i u- by the addition of the recombinase called Integrase. The 

sites, or (n) m a linear Vector Donor, at least one recombi- ' , ... .... , 

... ,, , . .. .. c . , second recombination reaction is driven by adding the 

nation site, where the recombination sites are preferably _ . . . /u ct^t 

, . a .. „- • c u- recombinase Cre to the Comtegrate (here, pEZC7 

engineered to enhance specificity or efficiency of recombi- . e \ » r 

nation. oin egra t '' 

One Vector Donor embodiment comprises a first DNA 25 FIG. 2B depicts a restriction map of pEZC705. 

segment and a second DNA segment, the first or second FIG. 2C depicts a restriction map of pEZC726. 

segment comprising a Selectable marker. A second Vector piG. 2D depicts a restriction map of pEZC7 Cointegrate. 

Donor embodiment comprises a first DNA segment and a pjQ 2E depicts a restriction map of Intprod. 

second DNA segment, the first or second DNA segment ^ & ^..^ of 

comprising a toxic gene. A third Vector Donor embodiment /. . . , , r , . . 

comprises a first DNAsegment and a second DNAsegment, , FIG ^ 3A depicts an in vitro method of recombimng an 

the first or second DNA segment comprising an inactive If ert Donor plasmid (here, P EZC602) with a Vector Donor 

fragment of at least one Selectable marker, wherein the P lasmid pEZC629) and obtaining Product (here, 

inactive fragment of the Selectable marker is capable of EZC6prod) and Byproduct (here, EZC6Bypr) daughter mo - 

reconstituting a functional Selectable marker when recom- 35 ecules - The two recombination sites are loxP and loxP 511. 

bined across the first or second recombination site with One segment of pEZC629 defined by these sites is a kana- 

another inactive fragment of at least one Selectable marker. ™y c ™ resistance gene whose promoter has been replaced by 

The present recombinational cloning method possesses the tet0P operator/promoter from transposon TnlO. In the 

several advantages over previous in vivo methods. Since absence of tet repressor protein, E. col, RNA polymerase 

single molecules of recombination products can be intro- 40 transcribes the kanamycin resistance gene from the tetOP. If 

duced into a biological host, propagation of the desired tet repressor is present, it binds to tetOP and blocks tran- 

Product DNA in the absence of other DNA molecules (e.g., scri P lion of the kanamycin resistance gene. The other seg- 

starting molecules, intermediates, and by-products) is more ment of P EZC629 has *e tet repressor gene expressed I b> -a 

readily realized. Reaction conditions can be freely adjusted constitutive promoter. Thus cells transformed by pEZC629 

in vitro to optimize enzyme activities. DNA molecules can 45 are resistant t0 chloramphenicol, because of the chloram- 

be incompatible with the desired biological host (e.g, YACs, phemcol acetyl transferase gene on the same segment as 

genomic DNA, etc.), can be used. Recombination proteins tetR > but are ^Uve to kanamyem. The reactions result in 

from diverse sources can be employed, together or sequen- separation of the tetR gene from the regulated kanamycin 

,• ii resistance gene. This separation results m kanamycin resis- 

Other embodiments will be evident to those of ordinary 50 tance in cells receivin g the desired recombination product, 

skill in the art from the teachings contained herein in The first and the second recombination events are driven by 

combination with what is known to the art. lhe Litton of the same recombinase, Cre. 

FIG. 3B depicts a restriction map of EZC6Bypr. 

BRIEF DESCRIPTION OF THE FIGURES FIG. 3C depicts a restriction map of EZC6prod. 

FIG. 1 depicts one general method of the present " FIG. 3D depicts a restriction map of P EZC602. 

invention, wherein the starting (parent) DNA molecules can FIG. 3E depicts a restriction map of pEZC629. 

be circular or linear. The goal is to exchange the new FIG. 3F depicts a restriction map of EZC6coint. 

subcloning vector D for the original cloning vector B. It is FIG. 4 A depicts an application of the in vitro method of 

desirable in one embodiment to select for AD and against all 60 recombinational cloning to subclone the chloramphenicol 

the other molecules, including the Cointegrate. The square acetyl transferase gene into a vector for expression in 

and circle are sites of recombination: e.g., loxP sites, att eukaryotic cells. The Insert Donor plasmid, pEZC843, is 

sites, etc. For example, segment D can contain expression comprised of the chloramphenicol acetyl transferase gene of 

signals, new drug markers, new origins of replication, or E. coli, cloned between loxP and attB sites such that the loxP 

specialized functions for mapping or sequencing DNA. 65 s jt e is positioned at the 5'-end of the gene. The Vector Donor 

FIG. 2A depicts an in vitro method of recombining an plasmid, pEZC1003, contains the cytomegalovirus eukary- 

Insert Donor plasmid (here, pEZC705) with a Vector Donor otic promoter apposed to a loxP site. The supercoiled 
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plasmids were combined with lambda Integrase and Cre 
recombinase in vitro. After incubation, competent E. coli 
cells were transformed with the recombinational reaction 
solution. Aliquots of transformations were spread on agar 
plates containing kanamycin to select for the Product mol- 
ecule (here CMVProd). 

FIG. 4B depicts a restriction map of pEZC843. 

FIG. 4C depicts a restriction map of pEZC1003. 

FIG. 4D depicts a restriction map of CMVBypro. 

FIG. 4E depicts a restriction map of CMVProd. 

FIG. 4F depicts a restriction map of CMVcoint. 

FIG. 5A depicts a vector diagram of pEZC1301. 

FIG. 5B depicts a vector diagram of pEZC1305. 

FIG. 5C depicts a vector diagram of pEZC1309. 

FIG. 5D depicts a vector diagram of pEZC1313. 

FIG. 5E depicts a vector diagram of pEZC1317. 

FIG. 5F depicts a vector diagram of pEZC1321. 

FIG. 5G depicts a vector diagram of pEZC1405. 

FIG. 5H depicts a vector diagram of pEZC1502. 

FIG. 6A depicts a vector diagram of pEZC1603. 

FIG. 6B depicts a vector diagram of pEZC1706. 

FIG. 7A depicts a vector diagram of pEZC2901. 

FIG. 7B depicts a vector diagram of pEZC2913 

FIG. 7C depicts a vector diagram of pEZC3101. 

FIG. 7D depicts a vector diagram of pEZC1802. 

FIG. 8A depicts a vector diagram of pGEX-2TK. 

FIG. 8B depicts a vector diagram of pEZC3501. 

FIG. 8C depicts a vector diagram of pEZC3601. 

FIG. 8D depicts a vector diagram of pEZC3609. 

FIG. 8E depicts a vector diagram of pEZC3617. 

FIG. 8F depicts a vector diagram of pEZC3606. 

FIG. 8G depicts a vector diagram of pEZC3613. 

FIG. 8H depicts a vector diagram of pEZC3621. 

FIG. 81 depicts a vector diagram of GST-CAT. 

FIG. 8J depicts a vector diagram of GST-phoA. 

FIG. 8K depicts a vector diagram of pEZC3201. 
DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

It is unexpectedly discovered in the present invention that 
subcloning reactions can be provided using recombinational 
cloning. Recombination cloning according to the present 
invention uses DNAs, vectors and methods, in vitro and in 
vivo, for moving or exchanging segments of DNA mol- 
ecules using engineered recombination sites and recombi- 
nation proteins. These methods provide chimeric DNA mol- 
ecules that have the desired characteristic(s) and/or DNA 
segments). 

The present invention thus provides nucleic acid, vectors 
and methods for obtaining chimeric nucleic acid using 
recombination proteins and engineered recombination sites, 
in vitro or in vivo. These methods are highly specific, rapid, 
and less labor intensive than what is disclosed or suggested 
in the related background art. The improved specificity, 
speed and yields of the present invention facilitates DNA or 
RNA subcloning, regulation or exchange useful for any 
related purpose. Such purposes include in vitro recombina- 
tion of DNA segments and in vitro or in vivo insertion or 
modification of transcribed, replicated, isolated or genomic 
DNA or RNA. 

Definitions 

In the description that follows, a number of terms used in 
recombinant DNA technology are utilized extensively. In 
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order to provide a clear and consistent understanding of the 
specification and claims, including the scope to be given 
such terms, the following definitions are provided. 

Byproduct: is a daughter molecule (a new clone produced 

5 after the second recombination event during the recombi- 
national cloning process) lacking the DNA which is desired 
to be subcloned. 

Cointegrate: is at least one recombination intermediate 
DNA molecule of the present invention that contains both 

10 parental (starting) DNA molecules. It will usually be circu- 
lar. In some embodiments it can be linear. 

Host: is any prokaryotic or eukaryotic organism that can 
be a recipient of the recombinational cloning Product. A 
"host," as the term is used herein, includes prokaryotic or 

15 eukaryotic organisms that can be genetically engineered. For 
examples of such hosts, see Maniatis et al., Molecular 
Cloning. A Laboratory Manual, Cold Spring Harbor 
Laboratory, Cold Spring Harbor, N.Y. (1982). 

Insert: is the desired DNA segment (segment A of FIG. 1) 
which one wishes to manipulate by the method of the present 

20 invention. The insert can have one or more genes. 

Insert Donor: is one of the two parental DNA molecules 
of the present invention which carries the Insert. The Insert 
Donor DNA molecule comprises the Insert flanked on both 
sides with recombination signals. The Insert Donor can be 

25 linear or circular. In one embodiment of the invention, the 
Insert Donor is a circular DNA molecule and further com- 
prises a cloning vector sequence outside of the recombina- 
tion signals (see FIG. 1). 
Product: is one or both the desired daughter molecules 

30 comprising the A and D or B and C sequences which are 
produced after the second recombination event during the 
recombinational cloning process (see FIG. 1). The Product 
contains the DNA which was to be cloned or subcloned. 
Promoter: is a DNA sequence generally described as the 

35 5'-region of a gene, located proximal to the start codon. The 
transcription of an adjacent DNA segment is initiated at the 
promoter region. A repressible promoter's rate of transcrip- 
tion decreases in response to a repressing agent. An induc- 
ible promoter's rate of transcription increases in response to 

40 an inducing agent. A constitutive promoter's rate of tran- 
scription is not specifically regulated, though it can vary 
under the influence of general metabolic conditions. 

Recognition sequence: Recognition sequences are par- 
ticular DNA sequences which a protein, DNA, or RNA 
molecule (e.g., restriction endonuclease, a modification 

45 methylase, or a recombinase) recognizes and binds. For 
example, the recognition sequence for Cre recombinase is 
loxP which is a 34 base pair sequence comprised of two 13 
base pair inverted repeats (serving as the recombinase 
binding sites) flanking an 8 base pair core sequence. See 

so FIG. 1 of Sauer, B., Current Opinion in Biotechnology 
5:521-527 (1994). Other examples of recognition sequences 
are the attB, attP, attL, and attR sequences which are 
recognized by the recombinase enzyme \ Integrase. attB is 
an approximately 25 base pair sequence containing two 9 

55 base pair core-type Int binding sites and a 7 base pair overlap 
region. attP is an approximately 240 base pair sequence 
containing core-type Int binding sites and arm-type Int 
binding sites as well as sites for auxiliary proteins IHF, FIS, 
and Xis. See Landy, Current Opinion in Biotechnology 

60 3:699-707 (1993). Such sites are also engineered according 
to the present invention to enhance methods and products. 

Recombinase: is an enzyme which catalyzes the exchange 
of DNA segments at specific recombination sites. 

Recombinational Cloning: is a method described herein, 

65 whereby segments of DNA molecules are exchanged, 
inserted, replaced, substituted or modified, in vitro or in 
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Recombination proteins: include excisive or integrative 
proteins, enzymes, co-factors or associated proteins that are 
involved in recombination reactions involving one or more 
recombination sites. See, Landy (1994), infra. 

Repression cassette: is a DNA segment that contains a 
repressor of a Selectable marker present in the subcloning 

Selectable marker: is a DNA segment that allows one to 
select for or against a molecule or a cell that contains it, 
often under particular conditions. These markers can encode 
an activity, such as, but not limited to, production of RNA, 
peptide, or protein, or can provide a binding site for RNA, 
peptides, proteins, inorganic and organic compounds or 
compositions and the like. Examples of Selectable markers 
include but are not limited to: (1) DNA segments that encode 
products which provide resistance against otherwise toxic 
compounds (e.g., antibiotics); (2) DNA segments that 
encode products which are otherwise lacking in the recipient 
cell (e.g., tRNA genes, auxotrophic markers); (3) DNA 
segments that encode products which suppress the activity 
of a gene product; (4) DNA segments that encode products 
which can be readily identified (e.g., phenotypic markers 
such as p-galactosidase, green fluorescent protein (GFP), 
and cell surface proteins); (5) DNA segments that bind 
products which are otherwise detrimental to cell survival 
and/or function; (6) DNA segments that otherwise inhibit the 
activity of any of the DNA segments described in Nos. 1-5 
above (e.g., antisense oligonucleotides); (7) DNA segments 
that bind products that modify a substrate (e.g. restriction 
endonucleases); (8) DNA segments that can be used to 
isolate a desired molecule (e.g specific protein binding 
sites); (9) DNA segments that encode a specific nucleotide 
sequence which can be otherwise non-functional (e.g., for 
PCR amplification of subpopulations of molecules); and/or 
(10) DNA segments, which when absent, directly or indi- 
rectly confer sensitivity to particular compounds. 

Selection scheme: is any method which allows selection, 
enrichment, or identification of a desired Product or Product 
(s) from a mixture containing the Insert Donor, Vector 
Donor, and/or any intermediates, (e.g. a Cointegrate) 
Byproducts. The selection schemes of one preferred 
embodiment have at least two components that are either 
linked or unlinked during recombinational cloning. One 
component is a Selectable marker. The other component 
controls the expression in vitro or in vivo of the Selectable 
marker, or survival of the cell harboring the plasmid carrying 
the Selectable marker. Generally, this controlling element 
will be a repressor or inducer of the Selectable marker, but 
other means for controlling expression of the Selectable 
marker can be used. Whether a repressor or activator is used 
will depend on whether the marker is for a positive or 
negative selection, and the exact arrangement of the various 
DNA segments, as will be readily apparent to those skilled 
in the art. A preferred requirement is that the selection 
scheme results in selection of or enrichment for only one or 
more desired Products. As defined herein, to select for a 
DNA molecule includes (a) selecting or enriching for the 
presence of the desired DNA molecule, and (b) selecting or 
enriching against the presence of DNA molecules that are 
not the desired DNA molecule. 

In one embodiment, the selection schemes (which can be 
carried out reversed) will take one of three forms, which will 
be discussed in terms of FIG. 1. The first, exemplified herein 
with a Selectable marker and a repressor therefor, selects for 
molecules having segment D and lacking segment C. The 
second selects against molecules having segment C and for 
molecules having segment D. Possible embodiments of the 
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second form would have a DNA segment carrying a gene 
toxic to cells into which the in vitro reaction products are to 
be introduced. A toxic gene can be a DNA that is expressed 
as a toxic gene product (a toxic protein or RNA), or can be 
5 toxic in and of itself. (In the latter case, the toxic gene is 
understood to carry its classical definition of "heritable 
trait".) 

Examples of such toxic gene products are well known in 
the art, and include, but are not limited to, restriction 

io endonucleases (e.g., Dpnl) and genes that kill hosts in the 
absence of a suppressing function, e.g., kicB. A toxic gene 
can alternatively be selectable in vitro, e.g., a restriction site. 

In the second form, segment D carries a Selectable 
marker. The toxic gene would eliminate transformants har- 

15 boring the Vector Donor, Cointegrate, and Byproduct 
molecules, while the Selectable marker can be used to select 
for cells containing the Product and against cells harboring 
only the Insert Donor. 

The third form selects for cells that have both segments A 

20 and D in cis on the same molecule, but not for cells that have 
both segments in trans on different molecules. This could be 
embodied by a Selectable marker that is split into two 
inactive fragments, one each on segments A and D. 

25 The fragments are so arranged relative to the recombina- 
tion sites that when the segments are brought together by the 
recombination event, they reconstitute a functional Select- 
able marker. For example, the recombinational event can 
link a promoter with a structural gene, can link two frag- 

30 ments of a structural gene, or can link genes that encode a 
heterodimeric gene product needed for survival, or can link 
portions of a replicon. 

Site -specific recombinase: is a type of recombinase which 
typically has at least the following four activities: (1) 

35 recognition of one or two specific DNA sequences; (2) 
cleavage of said DNA sequence or sequences; (3) DNA 
topoisomerase activity involved in strand exchange; and (4) 
DNA ligase activity to reseal the cleaved strands of DNA. 
See Sauer, B., Current Opinions in Biotechnology 

40 5:521-527 (1994). Conservative site-specific recombination 
is distinguished from homologous recombination and trans- 
position by a high degree of specificity for both partners. The 
strand exchange mechanism involves the cleavage and 
rejoining of specific DNA sequences in the absence of DNA 

45 synthesis (Landy, A. (1989) Ann. Rev. Biochem. 
58:913-949). 

Subcloning vector: is a cloning vector comprising a 
circular or linear DNA molecule which includes an appro- 
priate replicon. In the present invention, the subcloning 

50 vector (segment D in FIG. 1) can also contain functional 
and/or regulatory elements that are desired to be incorpo- 
rated into the final product to act upon or with the cloned 
DNA Insert (segment A in FIG. 1). The subcloning vector 
can also contain a Selectable marker (contained in segment 

55 C in FIG. 1). 

Vector: is a DNA that provides a useful biological or 
biochemical property to an Insert. Examples include 
plasmids, phages, and other DNA sequences which are able 
to replicate or be replicated in vitro or in a host cell, or to 

60 convey a desired DNA segment to a desired location within 
a host cell. A Vector can have one or more restriction 
endonuclease recognition sites at which the DNA sequences 
can be cut in a determinable fashion without loss of an 
essential biological function of the vector, and into which a 

65 DNA fragment can be spliced in order to bring about its 
replication and cloning. Vectors can further provide primer 
sites, e.g., for PCR, transcriptional and/or translational ini- 
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tiation and/or regulation sites, recombinational signals, 
replicons, Selectable markers, etc. Clearly, methods of 
inserting a desired DNA fragment which do not require the 
use of homologous recombination or restriction enzymes 
(such as, but not limited to, UDG cloning of PCR fragments 
(U.S. Pat. No. 5,334,575, entirely incorporated herein by 
reference), T:A cloning, and the like) can also be applied to 
clone a fragment of DNA into a cloning vector to be used 
according to the present invention. The cloning vector can 
further contain a Selectable marker suitable for use in the 
identification of cells transformed with the cloning vector. 

Vector Donor: is one of the two parental DNA molecules 
of the present invention which carries the DNA segments 
encoding the DNA vector which is to become part of the 
desired Product. The Vector Donor comprises a subcloning 
vector D (or it can be called the cloning vector if the Insert 
Donor does not already contain a cloning vector) and a 
segment C flanked by recombination sites (see FIG. 1). 
Segments C and/or D can contain elements that contribute to 
selection for the desired Product daughter molecule, as 
described above for selection schemes. The recombination 
signals can be the same or different, and can be acted upon 
by the same or different recombinases. In addition, the 
Vector Donor can be linear or circular. 

Description 

One general scheme for an in vitro or in vivo method of 
the invention is shown in FIG. 1, where the Insert Donor and 
the Vector Donor can be either circular or linear DNA, but 
is shown as circular. Vector D is exchanged for the original 
cloning vector A. It is desirable to select for the daughter 
vector containing elements A and D and against other 
molecules, including one or more Cointegrate(s). The square 
and circle are different sets of recombination sites (e.g., lox 
sites or att sites). Segment A or D can contain at least one 
Selection Marker, expression signals, origins of replication, 
or specialized functions for detecting, selecting, expressing, 
mapping or sequencing DNA, where D is used in this 
example. 

Examples of desired DNA segments that can be part of 
Element A or D include, but are not limited to, PCR 
products, large DNA segments, genomic clones or 
fragments, cDNA clones, functional elements, etc., and 
genes or partial genes, which encode useful nucleic acids or 
proteins. Moreover, the recombinational cloning of the 
present invention can be used to make ex vivo and in vivo 
gene transfer vehicles for protein expression and/or gene 
therapy. 

In FIG. 1, the scheme provides the desired Product as 
containing vectors D and A, as follows. The Insert Donor 
(containing A and B) is first recombined at the square 
recombination sites by recombination proteins, with the 
Vector Donor (containing C and D), to form a Co-integrate 
having each of A-D-C-B. Next, recombination occurs at the 
circle recombination sites to form Product DNA (A and D) 
and Byproduct DNA (C and B). However, if desired, two or 
more different Co-integrates can be formed to generate two 
or more Products. 

In one embodiment of the present in vitro or in vivo 
recombinational cloning method, a method for selecting at 
least one desired Product DNA is provided. This can be 
understood by consideration of the map of plasmid 
pEZC726 depicted in FIG. 2. The two exemplary recombi- 
nation sites are attP and loxP. On one segment defined by 
these sites is a kanamycin resistance gene whose promoter 
has been replaced by the tetOP operator/promoter from 



!8,732 

12 

transposon TnlO. In the absence of tet repressor protein, E. 
coli RNA polymerase transcribes the kanamycin resistance 
gene from the tetOP. If tet repressor is present, it binds to 
tetOP and blocks transcription of the kanamycin resistance 

5 gene. The other segment of pEZC726 has the tet repressor 
gene expressed by a constitutive promoter. Thus cells trans- 
formed by pEZC726 are resistant to chloramphenicol, 
because of the chloramphenicol acetyl transferase gene on 
the same segment as tetR, but are sensitive to kanamycin. 

10 The recombination reactions result in separation of the tetR 
gene from the regulated kanamycin resistance gene. This 
separation results in kanamycin resistance in cells receiving 
the desired recombination Product. 

Two different sets of plasmids were constructed to dem- 

15 onstrate the in vitro method. One set, for use with Cre 
recombinase only (cloning vector 602 and subcloning vector 
629 (FIG. 3)) contained loxP and loxP 511 sites. A second 
set, for use with Cre and integrase (cloning vector 705 and 
subcloning vector 726 (FIG. 2)) contained loxP and att sites. 

20 The efficiency of production of the desired daughter plasmid 
was about 60 fold higher using both enzymes than using Cre 
alone. Nineteen of twenty four colonies from the Cre-only 
reaction contained the desired product, while thirty eight of 
thirty eight colonies from the integrase plus Cre reaction 

25 contained the desired product plasmid. 

Other Selection Schemes A variety of selection schemes 
can be used that are known in the art as they can suit a 
particular purpose for which the recombinational cloning is 
carried out. Depending upon individual preferences and 

30 needs, a number of different types of selection schemes can 
be used in the recombinational cloning method of the 
present invention. The skilled artisan can take advantage of 
the availability of the many DNA segments or methods for 
making them and the different methods of selection that are 

35 routinely used in the art. Such DNA segments include but 
are not limited to those which encodes an activity such as, 
but not limited to, production of RNA, peptide, or protein, 
or providing a binding site for such RNA, peptide, or 
protein. Examples of DNA molecules used in devising a 

40 selection scheme are given above, under the definition of 
"selection scheme" 

Additional examples include but are not limited to: 

(i) Generation of new primer sites for PCR (e.g., juxta- 
4J position of two DNA sequences that were not previ- 
ously juxtaposed); 

(ii) Inclusion of a DNA sequence acted upon by a restric- 
tion endonuclease or other DNA modifying enzyme, 
chemical, ribozyme, etc.; 

50 (iii) Inclusion of a DNA sequence recognized by a DNA 
binding protein, RNA, DNA, chemical, etc.) (e.g., for 
use as an affinity tag for selecting for or excluding from 
a population) (Davis, Nucl. Acids Res. 24.:702-706 
(1996); 7. Virol. 69. 8027-8034 (1995)); 

55 (iv) In vitro selection of RNA ligands for the ribosomal 
L22 protein associated with Epstein-Barr virus- 
expressed RNA by using randomized and cDNA- 
derived RNA libraries; 
(vi) The positioning of functional elements whose activity 

60 requires a specific orientation or juxtaposition (e.g, (a) 
a recombination site which reacts poorly in trans, but 
when placed in cis, in the presence of the appropriate 
proteins, results in recombination that destroys certain 
populations of molecules; (e.g., reconstitution of a 

65 promoter sequence that allows in vitro RNA synthesis). 
The RNA can be used directly, or can be reverse 
transcribed to obtain the desired DNA construct; 
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(vii) Selection of the desired product by size (e.g., available commercially (Novagen, Catalog No. 69247-1). 
fractionation) or other physical property of the Recombination mediated by Cre is freely reversible. From 
molecule(s); and thermodynamic considerations it is not surprising that Cre- 

(viii) Inclusion of a DNA sequence required for a specific mediated integration (recombination between two molecules 

to form one molecule) is much less efficient than Cre- 
mediated excision (recombination between two loxP sites in 



modification (e.g., methylation) that allows its identi- 
fication. 



After formation of the Product and Byproduct in the the f me molecule to form two daughter molecules). Cre 

, , r . . .. .. , . . works in simple buffers with either magnesium or spermi- 

method of the present invention, the selection step can be dine ag a cofactor> ag fe ^ knQwn ^ ^ art ^ DNA 

earned out either m vitro or in vivo depending upon the substra(es can be either Hnear or SU p er coiled. A number of 

particular selection scheme which has been optionally 10 mutanl loxP sites have been described (Ho ess et al., supra), 

devised m the particular recombmat,onal cloning procedure. Qne of [he bxP 5 recombines with another loxP 511 

For example an m vitro method of selection can be ^ bu( wi „ ^ recombine with a loxP site 

devised for the Insert Donor and Vector Donor DNA mol- , , . . . , . . • u iuj.u. 

• o i_ i_ -l • • . • Integrase: A protein irom bacteriophage lambda that 

ecules. Such scheme can involve engineering a rare restnc- ,• i_ • . t ,L i uj • . .u r 

. , f ■ u .u . « mediates the integration of the lambda genome into the E. 

tion site m the starting circular vectors in such a way that is ,. , & „, .... , ,. .. , 

a ,. .. , ... . J . coh chromosome. The bactenophage X Int recombmational 

after the recombination events the rare cutting sites end up . , . ... ,. ,. , . 

in the Byproduct. Hence, when the restriction enzyme which P r ° lein f P™ m t ° te ^reversible recombination between its 

binds and cuts at the rare restriction site is added to the f ubstrate a " f% as P ar ' °f the f fonnaton or induction of a 

„ CtU a i i • lysosenic state. Reversibility of the recombination reactions 

reaction mixture in vitro, all of the DNA molecules carrying t. c . • , , . f • , A 

., ... .. . ., ... „ NT . , , on results from two independent pathways for integrative and 

the rare cutting site, i.e., the starting DNA molecules, the 20 . . f • . . 

„ . . . & , ,. t, , . .„ , . , , , excisive recombination. Each pathway uses a unique, but 

Cointegrate, and the Byproduct, will be cut and rendered , . , ,,, , . r , . ,/ ., ... • 

nonreplcabie in the intended host cell. For example, cutting °™ ^nwA °r P r0,ein H bindin § f. es 'hat comprise 

sites in segments B and C (see FIG. 1) can be used to select a " f e D f NAs ' ^T^iuv™^^ f™ ch0 ™ 

against all molecules except the Product. Alternatively, only mvolvin S f ° ur P rot f ms f Int ' Xis ' and FIS ) determme the 

... .. . _ . , , c ■ ti . 1 . c ik direction of recombination. 

a cutting site in C is needed if one is able to select for 25 

segment D, e.g, by a drug resistance gene not found on B. Integrative recombination involves > the nt and IHF pro- 

Similarly, an in vitro selection method can be de V1S ed teins and sl f fP(240bp) and attB (25 bp). Recombination 

when dealing with linear DNA molecules. DNA sequences results m the formation of two new sites: attL and attR. 

complementary to a PCR primer sequence can be so engi- Ex ™ recombination requires Int, HF, and Xis, and sites 

neeredthatth. , ,re transferred, through the recombinational 30 attL and attR to generate attP and aftB. Under certain 

cloning method, only to the Product molecule. After the conditions, FIS stimulates excisive recombination. In addi- 

reactions are completed, the appropriate primers are added tl0n t0 these normal reactions, ,t should be appreciated that 

to the reaction solution and the sample is subjected to PCR. attP ^ attB >. when P laced on the same molecule, can 

Hence, all or part of the Product molecule is amplified. P romo,e excisive recombination to generate two excision 

Other in vivo selection schemes can be used with a variety 35 products, one with attL and one with attR. Similarly, inter- 



',. coli cell lines. One is to put a repressor gen 



molecular recombination between molecules containing attL 



segment of the subcloning plasmid, and a drug marker and attR > in the P resence of Int ' and Xis > can re f 11 in 

controlled by that repressor on the other segment of the same integrative recombination and the generation attP and attB. 

plasmid. Another is to put a killer gene on segment C of the Hence > b y DNA segments with appropriate com- 

subcloning plasmid (FIG. 1). Of course a way must exist for 40 binations of engineered att sites, m the presence of the 

growing such a plasmid, i.e., there must exist circumstances appropriate recombination proteins, one can direct excisive 

under which the killer gene will not kill. There are a number or integrative recombination, as reverse reactions of each 

of these genes known which require particular strains of E. other. 

coli. One such scheme is to use the restriction enzyme Dpnl, Each of the att SItes contains a 15 bp core sequence; 

which will not cleave unless its recognition sequence GATC 45 individual sequence elements of functional significance He 

is methylated. Many popular common E. coli strains mefhy- within, outside, and across the boundaries of this common 

late GATC sequences, but there are mutants in which cloned core (Landy, A., Ann. Rev. Biochem. 58:913 (1989)). Effi- 

Dpnl can be expressed without harm. cient recombination between the various att sites requires 

Of course analogous selection schemes can be devised for that the sequence of the central common region be identical 

other host organisms. For example, the tet repressor/operator 50 between the recombining partners, however, the exact 

of TnlO has been adapted to control gene expression in sequence is now found to be modifiable. Consequently, 

eukaryotes (Gossen, ML, and Bujard, H., Proc. Natl. Acad. derivatives of the att site with changes within the core are 

Sci. USA 89:5547-5551 (1992)). Thus the same control of now discovered to recombine as least as efficiently as the 

drug resistance by the tet repressor exemplified herein can native core sequences. 

be applied to select for Product in eukaryotic cells. 55 Integrase acts to recombine the attP site on bacteriophage 

lambda (about 240 bp) with the attB site on the E. coli 



Recombination Proteins 



genome (about 25 bp) (Weisberg, R. A. and Landy, A. i 



In the present invention, the exchange of DNA segments Lambda II, p. 211 (1983), Cold Spring Harbor Laboratory)), 

is achieved by the use of recombination proteins, including to produce the integrated lambda genome flanked by attL 

recombinases and associated co-factors and proteins. Vari- 60 (about 100 bp) and attR (about 160 bp) sites. In the absence 

ous recombination proteins are described in the art. of Xis (see below), this reaction is essentially irreversible. 

Examples of such recombinases include: The integration reaction mediated by integrase and IHF 

Cre: A protein from bacteriophage PI (Abremski and works in vitro, with simple buffer containing spermidine. 

Hoess,/. Biol. Chem.259 (3):1509-1514 (1984)) catalyzes Integrase can be obtained as described by Nash, H. A., 

the exchange (i.e., causes recombination) between 34 bp 65 Methods of Enzymology 100:210-216 (1983). IHF can be 

DNA sequences called loxP (locus of crossover) sites (See obtained as described by Filutowicz, M., et al., Gene 

Hoess et al, Nucl. Acids Res. 14(5):2287 (1986)). Cre is 147:149-150 (1994). 
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In the presence of the X protein Xis (excise) integrase 
catalyzes the reaction of attR and attL to form attP and attB, 
i.e., it promotes the reverse of the reaction described above. 
This reaction can also be applied in the present invention. 

Other Recombination Systems. Numerous recombination 
systems from various organisms can also be used, based on 
the teaching and guidance provided herein. See, e.g., Hoess 
et al., Nucleic Acids Research 14(6):2287 (1986); Abremski 
et al., J. Biol. Chem. 261(1):391 (1986); Campbell, J. 
Bacteriol. 174(23):7495 (1992); Qian et al., J. Biol. Chem. 
267(11):7794 (1992); Araki et al, J. Mol. Biol. 225(1):25 
(1992)). Many of these belong to the integrase family of 
recombinases (Argos et al. EMBO J. 5:433-440 (1986)). 
Perhaps the best studied of these are the Integrase/att system 
from bacteriophage X. (Landy, A. (1993) Current Opinions in 
Genetics andDevel. 3:699-707), the Cre/loxP system from 
bacteriophage PI (Hoess and Abremski (1990) In Nucleic 
Acids and Molecular Biology, vol. 4. Eds.: Eckstein and 
Lilley, Berlin-Heidelberg: Springer- Verlag; pp. 90-109), 
and the FLP/FRT system from the Saccharomyces cerevisiae 
2 fx circle plasmid (Broach et al. Cell 29:227-234 (1982)). 

Members of a second family of site-specific 
recombinases, the resolvase family (e.g, yb, Tn3 resolvase, 
Hin, Gin, and Cin) are also known. Members of this highly 
related family of recombinases are typically constrained to 
intramolecular reactions (e.g., inversions and excisions) and 
can require host-encoded factors. Mutants have been iso- 
lated that relieve some of the requirements for host factors 
(Maeser and Kahnmann (1991) Mol. Gen. Genet. 
230: 170-176), as well as some of the constraints of intramo- 
lecular recombination. 

Other site-specific recombinases similar to X Int and 
similar to PICre can be substituted for Int and Cre. Such 
recombinases are known. In many cases the purification of 
such other recombinases has been described in the art. In 
cases when they are not known, cell extracts can be used or 
the enzymes can be partially purified using procedures 
described for Cre and Int. 

While Cre and Int are described in detail for reasons of 
example, many related recombinase systems exist and their 
application to the described invention is also provided 
according to the present invention. The integrase family of 
site-specific recombinases can be used to provide alternative 
recombination proteins and recombination sites for the 
present invention, as site-specific recombination proteins 
encoded by bacteriophage lambda, phi 80, P22, P2, 186, P4 
and PI. This group of proteins exhibits an unexpectedly 
large diversity of sequences. Despite this diversity, all of the 
recombinases can be aligned in their C-terminal halves. 

A 40-residue region near the C terminus is particularly 
well conserved in all the proteins and is homologous to a 
region near the C terminus of the yeast 2 mu plasmid Flp 
protein. Three positions are perfectly conserved within this 
family: histidine, arginine and tyrosine are found at respec- 
tive alignment positions 396, 399 and 433 within the well- 
conserved C-terminal region. These residues contribute to 
the active site of this family of recombinases, and suggest 
that tyrosine-433 forms a transient covalent linkage to DNA 
during strand cleavage and rejoining. See, e.g, Argos, P. et 
al., EMBO J. 5:433-40 (1986). 

Alternatively, IS231 and other Bacillus thuringiensis 
transposable elements could be used as recombination pro- 
teins and recombination sites. Bacillus thuringiensis is an 
entomopathogenic bacterium whose toxicity is due to the 
presence in the sporangia of delta-endotoxin crystals active 
against agricultural pests and vectors of human and animal 
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diseases. Most of the genes coding for these toxin proteins 
are plasmid-borne and are generally structurally associated 
with insertion sequences (IS231, IS232, IS240, ISBT1 and 
ISBT2) and transposons (Tn4430 and Tn5401). Several of 
5 these mobile elements have been shown to be active and 
participate in the crystal gene mobility, thereby contributing 
to the variation of bacterial toxicity. 

Structural analysis of the iso-IS231 elements indicates 
that they are related to IS1151 from Clostridium perfringens 
30 and distantly related to IS4 and IS186 from Escherichia coli. 
Like the other IS4 family members, they contain a conserved 
transposase-integrase motif found in other IS families and 

Moreover, functional data gathered from IS231A in 
15 Escherichia coli indicate a non-replicative mode of 
transposition, with a preference for specific targets. Similar 
results were also obtained in Bacillus subtilis and B. thur- 
ingiensis. See, e.g., Mahillon, J. et al., Genetica 93:13-26 
(1994); Campbell, J. Bacteriol. 7495-7499 (1992). 
20 The amount of recombinase which is added to drive the 
recombination reaction can be determined by using known 
assays. Specifically, titration assay is used to determine the 
appropriate amount of a purified recombinase enzyme, or 
the appropriate amount of an extract. 

Engineered Recombination Sites. The above recombi- 
nases and corresponding recombinase sites are suitable for 
use in recombination cloning according to the present inven- 
tion. However, wild-type recombination sites contain 
30 sequences that reduce the efficiency or specificity of recom- 
bination reactions as applied in methods of the present 
invention. For example, multiple stop codons in attB, attR, 
attP, attL and loxP recombination sites occur in multiple 
reading frames on both strands, so recombination efficien- 
35 cies are reduced, e.g., where the coding sequence must cross 
the recombination sites, (only one reading frame is available 
on each strand of loxP and attB sites) or impossible (in attP, 
attR or attL). 

Accordingly, the present invention also provides engi- 
40 neered recombination sites that overcome these problems. 
For example, att sites can be engineered to have one or 
multiple mutations to enhance specificity or efficiency of the 
recombination reaction and the properties of Product DNAs 
(e.g, attl, att2, and att3 sites); to decrease reverse reaction 
4S (e.g., removing PI and HI from attB). The testing of these 
mutants determines which mutants yield sufficient recom- 
binational activity to be suitable for recombination subclon- 
ing according to the present invention. 

Mutations can therefore be introduced into recombination 
50 sites for enhancing site specific recombination. Such muta- 
tions include, but are not limited to: recombination sites 
without translation stop codons that allow fusion proteins to 
be encoded; recombination sites recognized by the same 
proteins but differing in base sequence such that they react 
55 largely or exclusively with their homologous partners allow 
multiple reactions to be contemplated. Which particular 
reactions take place can be specified by which particular 
partners are present in the reaction mixture. For example, a 
tripartite protein fusion could be accomplished with parental 
60 plasmids containing recombination sites attRl and atfR2; 
attLl and attL3; and/or attR3 and attL2. 

There are well known procedures for introducing specific 
mutations into nucleic acid sequences. A number of these are 
described in Ausubel, F. M. et al., Current Protocols in 
65 Molecular Biology, Wiley Interscience, New York 
(1989-1996). Mutations can be designed into 
oligonucleotides, which can be used to modify existing 
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cloned sequences, or in amplification reactions. Random 
mutagenesis can also be employed if appropriate selection 
methods are available to isolate the desired mutant DNA or 
RNA. The presence of the desired mutations can be con- 
firmed by sequencing the nucleic acid by well known 
methods. 

The following non-limiting methods can be used to engi- 
neer a core region of a given recombination site to provide 
mutated sites suitable for use in the present invention: 

1. By recombination of two parental DNA sequences by 
site-specific (e.g. attL and attR to give attB) or other 
(e.g. homologous) recombination mechanisms. The 
DNA parental DNA segments containing one or more 
base alterations resulting in the final core sequence; 

2. By mutation or mutagenesis (site-specific, PCR, 
random, spontaneous, etc) directly of the desired core 
sequence; 

3. By mutagenesis (site-specific, PCR, random, 
spontanteous, etc) of parental DNA sequences, which 
are recombined to generate a desired core sequence; 
and 

4. By reverse transcription of an RNA encoding the 
desired core sequence. 

The functionality of the mutant recombination sites can be 
demonstrated in ways that depend on the particular charac- 
teristic that is desired. For example, the lack of translation 
stop codons in a recombination site can be demonstrated by 
expressing the appropriate fusion proteins. Specificity of 
recombination between homologous partners can be dem- 
onstrated by introducing the appropriate molecules into in 
vitro reactions, and assaying for recombination products as 
described herein or known in the art. Other desired muta- 
tions in recombination sites might include the presence or 
absence of restriction sites, translation or transcription start 
signals, protein binding sites, and other known functional- 
ities of nucleic acid base sequences. Genetic selection 
schemes for particular functional attributes in the recombi- 
nation sites can be used according to known method steps. 
For example, the modification of sites to provide (from a 
pair of sites that do not interact) partners that do interact 
could be achieved by requiring deletion, via recombination 
between the sites, of a DNA sequence encoding a toxic 
substance. Similarly, selection for sites that remove trans- 
lation stop sequences, the presence or absence of protein 
binding sites, etc., can be easily devised by those skilled in 
the art. 

Accordingly, the present invention provides a nucleic acid 
molecule, comprising at least one DNA segment having at 
least two engineered recombination sites flanking a Select- 
able marker and/or a desired DNA segment, wherein at least 
one of said recombination sites comprises a core region 
having at least one engineered mutation that enhances 
recombination in vitro in the formation of a Cointegrate 
DNA or a Product DNA. 

The nucleic acid molecule can have at least one mutation 
that confers at least one enhancement of said recombination, 
said enhancement selected from the group consisting of 
substantially (i) favoring excisive integration; (ii) favoring 
excisive recombination; (ii) relieving the requirement for 
host factors; (iii) increasing the efficiency of said Cointe- 
grate DNA or Product DNA formation; and (iv) increasing 
the specificity of said Cointegrate DNA or Product DNA 
formation. 

The nucleic acid molecule preferably comprises at least 
one recombination site derived from attB, attP, attL or attR. 
More preferably the att site is selected from attl, att2, or att3, 
as described herein. 
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In a preferred embodiment, the core region comprises a 
DNA sequence selected from the group consisting of: 
(a) RKYCWGCTTFYKTRTACNAASTSGB (m-att) 
(SEQ ID NO:l); 
5 (b) AGCCWGCTTTYKTRTACNAACTSGB (m-attB) 
(SEQ ID NO:2); 
(c) GTTCAG CTTTCKTRTACNAACTS G B (m-attR) 
(SEQ ID NO:3); 
10 (d) AGCCWGCTTTCKTRTACNAAGTSGB (m-attL) 
(SEQ ID NO:4); 
(e) GTTCAGCTTTYKTRTACNAAGTSGB(m-attPl) 
(SEQ ID NO:5); 
or a corresponding or complementary DNA or RNA 
15 sequence, wherein R=A or G; K=G or TAJ; Y=C or T/U; 
W=A or TAJ; N=A or C or G or TAJ; S=Cor G; and B=C or 
G or TAJ, as presented in 37 C.F.R. §1.822, which is entirely 
incorporated herein by reference, wherein the core region 
does not contain a stop codon in one or more reading frames. 
20 The core region also preferably comprises a DNA 
sequence selected from the group consisting of: 

(a) AGCCTGCTTTTTTGTACAAACTTGT(attBl) (SEQ 
ID NO:6); 

(b) AGCCTGCTTTCTTGTACAAACTTGT (attB2) 
25 (SEQ ID NO:7); 

(c) ACCCAGCTFTCTTGTACAAACTTGT (attB3) 
(SEQ ID NO:8); 

(d) GTTCAGCTTTGTACAAACTTGT (attRl) (SEQ ID 
30 NO:9); 

(e) GTTCAGCTTRCTTGTACAAACTTGT (attnR2) 
(SEQ ID NO: 10); 

(f) GTTCAG CTTTCTTGTACAAAGTTGG (attR3) 
(SEQ ID NO:ll); 

35 (g) AGCCTGCTTTTTTGTACAAAGTTGG (attLl) 
(SEQ ID NO: 12); 
(h) AGCCTGCTTTCTTGTACAAAGTTGG (attL2) 
(SEQ ID NO:13); 
40 (i) ACCCAGCTTTCTTGTACAAAGTTGG (attL3) 
(SEQ ID NO: 14); 
G) GTTCAGCTTTTTTGTACAAAGTTGG(attPl) (SEQ 
ID NO: 15); 

(k) GTTCAGCTTTCTTGTACAAAGTTGG (attP2,P3) 

45 (SEQ ID NO: 16); or a corresponding or complemen- 
tary DNA or RNA sequence. 
The present invention thus also provides a method for 
making a nucleic acid molecule, comprising providing a 
nucleic acid molecule having at least one engineered recom- 

50 bination site comprising at least one DNA sequence having 
at least 80-99% homology (or any range or value therein) to 
at least one of SEQ ID NOS:l-16, or any suitable recom- 
bination site, or which hybridizes under stringent conditions 
thereto, as known in the art. 

55 Clearly, there are various types and permutations of such 
well-known in vitro and in vivo selection methods, each of 
which are not described herein for the sake of brevity. 
However, such variations and permutations are contem- 
plated and considered to be the different embodiments of the 

60 present invention. 

It is important to note that as a result of the preferred 
embodiment being in vitro recombination reactions, non- 
biological molecules such as PCR products can be manipu- 
lated via the present recombinational cloning method. In one 

65 example, it is possible to clone linear molecules into circular 
vectors. There are a number of applications for the present 
invention. These uses include, but are not limited to, chang- 
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ing vectors, apposing promoters with genes, constructing 
genes for fusion proteins, changing copy number, changing 
replicons, cloning into phages, and cloning, e.g, PCR prod- 
ucts (with an attB site at one end and a loxP site at the other 
end), genomic DNAs, and cDNAs. 

The following examples are intended to further illustrate 
certain preferred embodiments of the invention and are not 
intended to be limiting in nature. 

EXAMPLES 

The present recombinational cloning method accom- 
plishes the exchange of nucleic acid segments to render 
something useful to the user, such as a change of cloning 
vectors. These segments must be flanked on both sides by 
recombination signals that are in the proper orientation with 
respect to one another. In the examples below the two 
parental nucleic acid molecules (e.g., plasmids) are called 
the Insert Donor and the Vector Donor. The Insert Donor 
contains a segment that will become joined to a new vector 
contributed by the Vector Donor. The recombination 
intermediate(s) that contain(s) both starting molecules is 
called the Cointegrate(s). The second recombination event 
produces two daughter molecules, called the Product (the 
desired new clone) and the Byproduct. 
Buffers 

Various known buffers can be used in the reactions of the 
present invention. For restriction enzymes, it is advisable to 
use the buffers recommended by the manufacturer. Alterna- 
tive buffers can be readily found in the literature or can be 
devised by those of ordinary skill in the art. 

Examples 1-3. One exemplary buffer for lambda inte- 
grase is comprised of 50 mM Tris-HCl, at pH 7.5-7.8, 70 
mM KC1, 5 mM spermidine, 0.5 mM EDTA, and 0.25 mg/ml 
bovine serum albumin, and optionally, 10% glycerol. 

One preferred buffer for PI Cre recombinase is comprised 
of 50 mM Tris-HCl at pH 7.5, 33 mM NaCl, 5 mM 
spermidine, and 0.5 mg/ml bovine serum albumin. 

The buffer for other site-specific recombinases which are 
similar to lambda Int and PI Cre are either known in the art 
or can be determined empirically by the skilled artisans, 
particularly in light of the above-described buffers. 

Example 1 

Recombinational Cloning Using Cre and Cre & Int 
Two pairs of plasmids were constructed to do the in vitro 
recombinational cloning method in two different ways. One 
pair, pEZC705 and pEZC726 (FIG. 2A), was constructed 
with loxP and att sites, to be used with Cre and X integrase. 
The other pair, pEZC602 and pEZC629 (FIG. 3A), con- 
tained the loxP (wild type) site for Cre, and a second mutant 
lox site, loxP 511, which differs from loxP in one base (out 
of 34 total). The minimum requirement for recombinational 
cloning of the present invention is two recombination sites 
in each plasmid, in general X and Y. and X'and Y'. Recom- 
binational cloning takes place if either or both types of site 
can recombine to form a Cointegrate (e.g. X and X'), and if 
either or both (but necessarily a site different from the type 
forming the Cointegrate) can recombine to excise the Prod- 
uct and Byproduct plasmids from the Cointegrate (e.g. Y and 
Y 1 ). It is important that the recombination sites on the same 
plasmid do not recombine. It was found that the present 
recombinational cloning could be done with Cre alone. 
Cre-Only 

Two plasmids were constructed to demonstrate this con- 
ception (see FIG. 3A). pEZC629 was the Vector Donor 
plasmid. It contained a constitutive drug marker 
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(chloramphenicol resistance), an origin of replication, loxP 
and loxP 511 sites, a conditional drug marker (kanamycin 
resistance whose expression is controlled by the operator/ 
promoter of the tetracycline resistance operon of transposon 

5 TnlO), and a constitutively expressed gene for the tet 
repressor protein, tetR. E. coli cells containing pEZC629 
were resistant to chloramphenicol at 30 figlml, but sensitive 
to kanamycin at 100 fig/ml. pEZC602 was the Insert Donor 
plasmid, which contained a different drug marker 

10 (ampicillin resistance), an origin, and loxP and loxP 511 
sites flanking a multiple cloning site. 

This experiment was comprised of two parts as follows: 
Part I: About 75 ng each of pEZC602 and pEZC629 were 
mixed in a total volume of 30 fi\ of Cre buffer (50 mM 

15 Tris-HCl pH 7.5, 33 mM NaCl, 5 mM spermidine-HCl, 500 
fig/m\ bovine serum albumin). Two 10 fi\ aliquots were 
transferred to new tubes. One tube received 0.5 fi\ of Cre 
protein (approx. 4 units per /A; partially purified according 
to Abremski and Hoess, J. Biol. Chem. 259:1509 (1984)). 

20 Both tubes were incubated at 37° C. for 30 minutes, then 70° 
C. for 10 minutes. Aliquots of each reaction were diluted and 
transformed into DH5a. Following expression, aliquots 
were plated on 30 figlm\ chloramphenicol; 100 ,«g/ml ampi- 
cillin plus 200^g/ml methicillin; or 100/<g/ml kanamycin. 

25 Results: See Table 1. The reaction without Cre gave l.llx 
10 6 ampicillin resistant colonies (from the Insert Donor 
plasmid pEZC602); 7.8xl0 5 chloramphenicol resistant colo- 
nies (from the Vector Donor plasmid pEZC629); and 140 
kanamycin resistant colonies (background). The reaction 

30 with added Cre gave 7.5x10 s ampicillin resistant colonies 
(from the Insert Donor plasmid pEZC602); 6.1x10 s 
chloramphenicol resistant colonies (from the Vector Donor 
plasmid pEZC629); and 760 kanamycin resistant colonies 
(mixture of background colonies and colonies from the 

35 recombinational cloning Product plasmid). Analysis: 
Because the number of colonies on the kanamycin plates 
was much higher in the presence of Cre, many or most of 
them were predicted to contain the desired Product plasmid. 

40 TABLE 1 



Chloram- 

Enzyme Ampicillin phenicol Kanamycin Efficiency 

None 1.1 x 10 6 7.8 x 10 s 140 140/7.8 x 10 5 = 0.02% 

45 Cre 7.5 x 10 5 6.1 x 10 s 760 760/6.1 x 10 5 = 0.12% 



Part II: Twenty four colonies from the "+Cre" kanamycin 
plates were picked and inoculated into medium containing 
100 /ig/ml kanamycin. Minipreps were done, and the mini- 

50 prep DNAs, uncut or cut with Smalor Hindlll, were elec- 
trophoresed. Results: 19 of the 24 minipreps showed super- 
coiled plasmid of the size predicted for the Product plasmid. 
All 19 showed the predicted Smal and Hindlll restriction 
fragments. Analysis: The Cre only scheme was demon- 

55 strated. Specifically, it was determined to have yielded about 
70% (19 of 24) Product clones. The efficiency was about 
0.1% (760 kanamycin resistant clones resulted from 6.1 xlO 5 
chloramphenicol resistant colonies). 
Cre Plus Integrase 

60 The plasmids used to demonstrate this method are exactly 
analogous to those used above, except that pEZC726, the 
Vector Donor plasmid, contained an attP site in place of loxP 
511, and pEZC705, the Insert Donor plasmid, contained an 
attB site in place of loxP 511 (FIG. 2A). 

65 This experiment was comprised of three parts as follows: 
Part I: About 500 ng of pEZC705 (the Insert Donor 
plasmid) was cut with Seal, which linearized the plasmid 
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within the ampicillin resistance gene. (This was done Example 2 

because the X integrase reaction has been historically done . . . 

with the attB plasmid in a linear state (H. Nash, personal c ^ing m vitro Recombinational C omng to 

. . . IT . . ,\ Subclone the Chloramphenicol Acetyl Transferase 
communication). However, it was found later that the inte- ^ ^ a VectQr fo * Expression ^ Eukaryotic 
grase reaction proceeds well with both plasmids Ce jj s (pjQ 4^ 
supercoiled.) Then, the linear plasmid was ethanol precipi- 
tated and dissolved in 20 (A of X integrase buffer (50 mM An Insert Donor plasmid, pEZC843, was constructed, 
Tris-HCl, about pH 7.8, 70 mM KC1, 5 mM spermidine- comprising the chloramphenicol acetyl transferase gene of 
.,„, T-rv™ orn , , . • „ iu „-„\ E. coll, cloned between loxP and attB sites such that the loxP 
HC1, 0.5 mM EDTA, 250 ug/ml bovine serum albumin). 30 ' t , e ,„„ . m . 
a, 1 ,™ f ,u \r. ™ 11 C7 mx Slte was positioned at the 5'-end of the gene (FIG. 4B). A 
Also, about 500 ng of the Vector Donor plasmid pEZC726 Vector DonQr plasmid> pEZC1003; was constructed , which 
was ethanol precipitated and dissolved m 20 ,u\ X integrase conla i ne d the cytomegalovirus eukaryotic promoter apposed 
buffer. Just before use, X integrase (2 /A, 393 ^g/ml) was to a loxP site ( F j G 4 q 0 ne microliter aliquots of each 
thawed and diluted by adding 18 [A cold X integrase buffer. supercoiled plasmid (about 50 ng crude miniprep DNA) 
One /A IHF (integration host factor, 2.4 mg/ml, an accessory 15 were combined in a ten microliter reaction containing equal 
protein) was diluted into 150 /A cold X integrase buffer. parts of lambda integrase buffer (50 mM Tris-HCl, pH 7.8, 
Aliquots (2 [A) of each DNA were mixed with X integrase 70 mM KC1, 5 mM spermidine, 0.5 mM EDTA, 0.25 mg/ml 
buffer, with or without 1 [A each X integrase and IHF, in a bovine serum albumin) and Cre recombinase buffer (50 mM 
total of 10 fA. The mixture was incubated at 25° C. for 45 20 Tris-HCl pH 7.5, 33 mM NaCl, 5 mM spermidine 0.5 

minutes, then at 70° C. for 10 minutes. Half of each reaction bovi " e fi serum , alb " min >' TIT nd 32 Z 

' , 1 t 1 c recombinase, 16 ng integration host factor, and 32 ng 

was applied to an agarose gel. Results: In the presence of Jambda Mter incubation at 3fJ o Q for 30 minutes 

integrase and IHF, about 5% of the total DNA was converted and J5 o Q for 1Q minutes> one microliter was transformed 

to a linear Cointegrate form. Analysis: Activity of integrase int0 competent E. coli strain DH5a (Life Technologies, 

and IHF was confirmed. 25 inc.). Aliquots of transformations were spread on agar plates 

containing 200 /.ig/m\ kanamycin and incubated at 37° C. 

Part II: Three microliters of each reaction (i.e., with or overnight. An otherwise identical control reaction contained 

without integrase and IHF) were diluted into 27 /tl of Cre the Vector Donor plasmid only. The plate receiving 10% of 

buffer (above), then each reaction was split into two 10 fil the control reaction transformation gave one colony; the 

aliquots (four altogether). To two of these reactions, 0.5 /A 30 plate receiving 10% of the recombinational cloning reaction 

of Cre protein (above) were added, and all reactions were gave 144 colonies. These numbers suggested that greater 

incubated at 37° C. for 30 minutes, then at 70° C. for 10 than 99% of the recombinational cloning colonies contained 

minutes. TE buffer (90 & TE: 10 mM Tris-HCl, pH 7.5, 1 the desired P roducl P lasmkL Miniprep DNA made from six 

, \ « , , . , . , recombinational cloning colonies gave the predicted size 

mM ED 1A) was added to each reaction, and 1 id each was 35 . b „...„, b , „ .. .. 

„ ,. nrTr . ^, c plasmid (5026 base pairs), CMVProd Restrict] n digc ti n 

transformed into E. coli DH5a. The transformation mixtures ^ NcqI gaye ^ fragments predicted for the chloram . 

were plated on 100 fig/ml ampicillin plus 200 fig/ml methi- pherjicol acety i transferase cloned downstream of the CMV 

cillin; 30 jug/ml chloramphenicol; or 100 fig/ml kanamycin. promoter for all six plasmids. 



Results: See Table 2. 

TABLE 2 



Example 3 



Subcloned DNA Segments Flanked by attB Sites 

Chloram- Without Stop Codons 

Enzyme Ampicillin phenicol Kanamycin Efficiency p art j. Background 

None 990 20000 4 4/2 x io" = 0.02% 45 The above examples are suitable for transcriptional 

Cre only 280 3640 o o fusions, in which transcription crosses recombination sites, 

integrase* 1040 27000 9 9/2.7 x io" = 0.03% However, both attR and loxP sites contain multiple stop 

? nly . „ n n , n * n , ,„j <„, codons on both strands, so translational fusions can be 

Integrase* + 110 1110 76 76/1.1 x 10 J = 6.9% ,._ 1( 

™ e ° difficult, where the coding sequence must cross the recom- 

50 bination sites, (only one reading frame is available on each 

*integrase reactions also contained IHF. strand of loxP sites) or impossible (in attR or attL). 

A principal reason for subcloning is to fuse protein 

Analysis: The Cre protein impaired transformation. When domains. For example, fusion of the glutathione 

1 r . xr . t , . tl ■ • , , S-transferase (GST) domain to a protein of interest allows 

admsted for this effect, the number of kanamycin resistant . v . ' -c j. a; •, L , u 

J . , . 3 55 the fusion protein to be punfied by affinity chromatography 

colonies, compared to the control reactions, increased more ^ gMathione agarose (Pharrnacia; Inc . ; 1995 cata i og ). if 

than 100 fold when both Cre and Integrase were used. This tfae protein of interest is fased tQ rans of consecut i ve 

suggests a specificity of greater than 99%. histidines (for example His6), the fusion protein can be 

purified by affinity chromatography on chelating resins 

Part III: 38 colonies were picked from the Integrase plus 6Q containing metal ions (Q ia gen, Inc.). It is often desirable to 

Cre plates, miniprep DNAs were made and cut with HindlH compare amino terminal and carboxy terminal fusions for 

to give diagnostic mapping information. Result: All 38 had activity, solubility, stability, and the like, 

precisely the expected fragment sizes. Analysis: The Cre The attB sites of the bacteriophage X integration system 

plus X integrase method was observed to have much higher were examined as an alternative to loxP sites, because they 

specificity than Cre-alone. Conclusion: The Cre plus X 65 are small (25 bp) and have some sequence flexibility (Nash, 

integrase method was demonstrated. Efficiency and speci- H. A. et al, Proc. Natl. Acad. Sci. USA 84:4049-4053 

ficity were much higher than for Cre only. (1987). It was not previously suggested that multiple muta- 
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tions to remove all stop codes would result in useful recom- 
bination sites for recombinational subcloning. 

Using standard nomenclature for site specific recombina- 
tion in lambda bacteriophage (Weisber, in Lambda III, 
Hendrix, et al., eds., Cold Spring Harbor Laboratory, Cold 
Spring Harbor, N.Y. (1989)), the nucleotide regions that 
participate in the recombination reaction in an E. coli host 
cell are represented as follows: 



Int, IHF ij Xis, Int, It 
a[t R — PI — HI — P2 — X — H2 — C — O — B' — 
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Part II: Construction and Testing of Plasmids Containing 
Mutant att Sites 

Mutant attL and attR sites were constructed. Importantly, 

s Landy et al. {Ann. Rev. Biochem. 58:913 (1989)) observed 

5 that deletion of the Pland HI domains of attP facilitated the 
excision reaction and eliminated the integration reaction, 
thereby making the excision reaction irreversible. Therefore, 
as mutations were introduced in attR, the Pland HI domains 

io were also deleted. attR sites in the present example lack the 
Pland HI regions and have the Ndel site removed (base 
27630 changed from C to G), and contain sequences corre- 
sponding to bacteriophage X coordinates 27619-27738 
(GenBank release 92.0, bg:LAMCG, "Complete Sequence 

J5 of Bacteriophage Lambda"). 

The sequence of attB produced by recombination of wild 
type attL and attR sites is: 



where: O represents the 15 bp core DNA sequence found in 
both the phage and E. coli genomes; B and B' represent 
approximately 5 bases adjacent to the core in the E. coli 
genome; and PI, HI, P2, X, H2, C, C, H, PI, P'2, and P'3 
represent known DNA sequences encoding protein binding 
domains in the bacteriophage X genome. 

The reaction is reversible in the presence of the protein 
Xis (excisionase); recombination between attL, and attR 
precisely excise the X genome from its integrated state, 
regenerating the circular X genome containing attp and the 
linear E. coli genome containing attB. 



The stop codons are italicized and underlined. Note that 
sequences of attL, attR, and attP can be derived from the attB 
sequence and the boundaries of bacteriophage X contained 
30 within attL and attR (coordinates 27619 to 27818). 

When mutant attRl and atfLl sites were recombined the 
sequence attBl was produced (mutations in bold, large font): 



40 Note that the four stop codons are gone. 

When an additional mutation was introduced in the attRl 
and attLl sequences (bold), attR2 and attL2 sites resulted. 
Recombination of attR2 and attL2 produced the attB2 site: 



50 

The recombination activities of the above attL and attR 
sites were assayed as follows. The attB site of plasmid 
pEZC705 (FIG. 2B) was replaced with attLwt, attLl, or 
attL2. The attP site of plasmid pEZC726 (FIG. 2C) was 
55 replaced with attRwt (lacking regions Pland HI), attRl, or 
attR2. Thus, the resulting plasmids could recombine via 
their loxP sites, mediated by Cre, and via their attR and attL 
sites, mediated by Int, Xis, and IHF. Pairs of plasmids were 
mixed and reacted with Cre, Int, Xis, and IHF, transformed 
into E. coli competent cells, and plated on agar containing 
kanamycin. The results are presented in Table 3: 
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attRwt (pEZC1301) 


None 


1 (background) 




attLwt (pEZC1313) 


147 




attLl (pEZC1317) 






attL2 (pEZC1321) 


0 


attRl (pEZC1305) 




1 (background) 




attLwt (pEZC1313) 






attLl (pEZC1317) 


128 




attL2 (pEZC1321) 


0 


attR2 (pEZC1309) 


None 


9 (background) 




attLwt (pEZC1313) 


0 




attL2 (pEZC1317) 


0 




attL2 (pEZC1321) 


209 



(*1% of each transformation was spread on a kanamycin plate.) 

The above data show that whereas the wild type att and 
attl sites recombine to a small extent, the attl and att2 sites 
do not recombine detectably with each other. 

Part III. Recombination was demonstrated when the core 
region of both attb sites flanking the DNA segment of 
interest did not contain stop codons. The physical state of the 
participating plasmids was discovered to influence recom- 
bination efficiency. 

The appropriate att sites were moved into pEZC705 and 
pEZC726 to make the plasmids pEZC1405 (FIG. 5G) (attRl 
and attR2) and pEZC1502 (FIG. 5H) (attLl and attL2). The 
desired DNA segment in this experiment was a copy of the 
chloramphenicol resistance gene cloned between the two 
attL sites of pEZC1502. Pairs of plasmids were recombined 
in vitro using Int, Xis, and IHF (no Cre because no loxP sites 
were present). The yield of desired kanamycin resistant 
colonies was determined when both parental plasmids were 
circular, or when one plasmid was circular and the other 
linear as presented in Table 4: 

TABLE 4 

Vector donor 1 Gene donor 1 Kanamycin resistant colonies 2 

Circular pEZC1405 None 30 

Circular pEZC1405 Circular pEZC1502 2680 

Linear pEZC1405 None 90 

Linear pEZCl 405 Circular pEZC1502 172000 

Circular pEZC1405 Linear pEZCl 502 73000 

J DNAs were purified with Qiagen columns, concentrations determined by 
A260, and linearized with Xba I (pEZC1405) or AlwN I (pEZC1502). Each 
reaction contained 100 ng of the indicated DNA. All reactions (10 /il total) 
contained 3 /d of enzyme mix (Xis, Int, and IHF). After incubation (45 
minutes at 25°, 10 minutes at 65°), one /A was used to transform/^, coli DH5a 

m (1ml) had 



Analysis: Recombinational cloning using mutant attR and 
attL sites was confirmed. The desired DNA segment is 
subcloned between attB sites that do not contain any stop 
codons in either strand. The enhanced yield of Product DNA 
(when one parent was linear) was unexpected because of 
earlier observations that the excision reaction was more 
efficient when both participating molecules were super- 
coiled and proteins were limiting (Nunes-Duby et al., Cell 
50:779-788 (1987). 



s(for 
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Example 4 

Demonstration of Recombinational Cloning 
Without Inverted Repeats 

Part I. Rationale 

5 The above Example 3 showed that plasmids o 
inverted repeats of the appropriate recombination 
example, attLl and attL2 in plasmid pEZC1502) (FIG. 5H) 
could recombine to give the desired DNA segment flanked 
by attB sites without stop codons, also in inverted orienta- 

10 tion. A concern was the in vivo and in vitro influence of the 
inverted repeats. For example, transcription of a desired 
DNA segment flanked by attB sites in inverted orientation 
could yield a single stranded RNA molecule that might form 
a hairpin structure, thereby inhibiting translation. 

15 Inverted orientation of similar recombination sites can be 
avoided by placing the sites in direct repeat arrangement att 
sites. If parental plasmids each have a wild type attL and 
wild type attR site, in direct repeat the Int, Xis, and IHF 
proteins will simply remove the DNA segment flanked by 

20 those sites in an intramolecular reaction. However, the 
mutant sites described in the above Example 3 suggested 
that it might be possible to inhibit the intramolecular reac- 
tion while allowing the intermolecular recombination to 
proceed as desired. 

2S Part II: Structure of Plasmids Without Inverted Repeats for 
Recombinational Cloning 

The attR2 sequence in plasmid pEZC1405 (FIG. 5G) was 
replaced with atfL2, in the opposite orientation, to make 
pEZC1603 (FIG. 6A). The attL2 sequence of pEZC1502 

30 (FIG. 5H) was replaced with attR2, in the opposite 
orientation, to make pEZC1706 (FIG. 6B). Each of these 
plasmids contained mutations in the core region that make 
intramolecular reactions between attl and att2 cores very 
inefficient (see Example 3, above). 

35 Plasmids pEZC1405, pEZC1502, pEZC1603 and 
pEZC1706 were purified on Qiagen columns (Qiagen, Inc.). 
Aliquots of plasmids pEZC1405 and pEZC1603 were lin- 
earized with Xba I. Aliquots of plasmids pEZC1502 and 
pEZC1706 were linearized with AlwN I. One hundred ng of 

40 plasmids were mixed in buffer (equal volumes of 50 mM 
Tris HC1 pH 7.5, 25 mM Tris HC1 pH 8.0, 70 mM KC1, 5 
mM spermidine, 0.5 mM EDTA, 250 ^g/ml BSA, 10% 
glycerol) containing Int (43.5 ng), Xis (4.3 ng) and IHF (8.1 
ng) in a final volume of lO/d. Reactions were incubated for 

45 45 minutes at 25° C, 10 minutes at 65 ° C, and 1 /A was 
transformed into E. coli DH5a. After expression, aliquots 
were spread on agar plates containing 200 l/<g/ml kanamy- 
cin and incubated at 37° C. 

Results, expressed as the number of colonies per 1 /A of 

50 recombination reaction are presented in Table 5: 

TABLE 5 

Vector Donor Gene Donor Colonies Predicted % product 



Linear 1405 
Circular 1405 
Circular 1603 
60 Circular 1603 
Linear 1603 



Circular 1603 



; Analysis. In all configurations, i.e., circular or linear, the 
pEZC1405xpEZC1502 pair (with att sites in inverted repeat 
configuration) was more efficient than pEZC1603x 
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pEZC1706 pair (with att sites mutated to avoid hairpin 
formation). The pEZC1603xpEZC1706 pair gave higher 
backgrounds and lower efficiencies than the pEZC1405x 
pEZC1502 pair. While less efficient, 80% or more of the 
colonies from the pEZC1603xpEZC1706 reactions were 5 
expected to contain the desired plasmid product. Making 
one partner linear stimulated the reactions in all cases. 
Part III: Confirmation of Product Plasmids' Structure 

Six colonies each from the linear pEZC1405 (FIG. 5G)x 
circular pEZC1502 (FIG. 5H), circular pEZC1405xlinear io 
pEZC1502, linear pEZC1603 (FIG. 6A)xcircular 
pEZC1706 (FIG. 6B), and circular pEZC1603xlinear 
pEZC1706 reactions were picked into rich medium and 
miniprep DNAs were prepared. Diagnostic cuts with Ssp I 
gave the predicted restriction fragments for all 24 colonies. 35 

Analysis. Recombination reactions between plasmids 
with mutant attL and attR sites on the same molecules gave 
the desired plasmid products with a high degree of speci- 
ficity. 

20 

Example 5 

Recombinational Cloning with a Toxic Gene 
Part I: Background 

Restriction enzyme Dpn I recognizes the sequence GATC 
and cuts that sequence only if the A is methylated by the dam 
methylase. Most commonly used E. co/i strains are dam + . 
Expression of Dpn I in dam* strains of E. coli is lethal 
because the chromosome of the cell is chopped into many 3Q 
pieces. However, in dam" cells expression of Dpn I is 
innocuous because the chromosome is immune to Dpn I 
cutting. 

In the general recombinational cloning scheme, in which 
the vector donor contains two segments C and D separated 35 
by recombination sites, selection for the desired product 
depends upon selection for the presence of segment D, and 
the absence of segment C. In the original Example segment 
D contained a drug resistance gene (Km) that was negatively 
controlled by a repressor gene found on segment C. When C 40 
was present, cells containing D were not resistant to kana- 
mycin because the resistance gene was turned off. 

The Dpn I gene is an example of a toxic gene that can 
replace the repressor gene of the above embodiment. If 
segment C expresses the Dpn I gene product, transforming 45 
plasmid CD into a dam* host kills the cell. If segment D is 
transferred to a new plasmid, for example by recombina- 
tional cloning, then selecting for the drug marker will be 
successful because the toxic gene is no longer present. 
Part II: Construction of a Vector Donor Using Dpn I as a 50 
Toxic Gene 

The gene encoding Dpn I endonuclease was amplified by 
PCR using primers 5'CCA CCA CAA ACG CGT CCATGG 
AAT TAC ACT TTA ATT TAG3' (SEQ. ID NO: 17) and 
5'CCA CCA CAA GTC GAC GCA TGC CGA CAG CCT 55 
TCC AAA TGT3' (SEQ. ID NO:18) and a plasmid contain- 
ing the Dpn I gene (derived from plasmids obtained from 
Sanford A. Lacks, Brookhaven National Laboratory, Upton, 
N.Y.; also available from American Type Culture Collection 
as ATCC 67494) as the template. 60 

Additional mutations were introduced into the B and B' 
regions of attL and attR, respectively, by amplifying existing 
attL and attR domains with primers containing the desired 
base changes. Recombination of the mutant attL3 (made 
with oligo Xisll5) and attR3 (made with oligo Xisll2) 65 
yielded attB3 with the following sequence (differences from 
attBl in bold): 
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BOB' 
5'ACCCA GCTTTCTTGTACAAA GTGGT 3' (SEQ. ID NO:8) 
3TGGGT CGAAAGAACATGTTT CACCA 5' (SEQ. ID NO:35) 



The attL3 sequence was cloned in place of attL2 of an 
existing Gene Donor plasmid to give the plasmid pEZC2901 
(FIG. 7A). The attR3 sequence was cloned in place of attR2 
in an existing Vector Donor plasmid to give plasmid 
pEZC2913 (FIG. 7B) Dpn I gene was cloned into plasmid 
pEZC2913 to replace the tet repressor gene. The resulting 
Vector Donor plasmid was named pEZC3101 (FIG. 7C). 
When pEZC3101 was transformed into the danT strain 
SCS110 (Stratagene), hundreds of colonies resulted. When 
the same plasmid was transformed into the dam+ strain 
DH5a, only one colony was produced, even though the 
DH5a cells were about 20 fold more competent than the 
SCS110 cells. When a related plasmid that did not contain 
the Dpn I gene was transformed into the same two cell lines, 
28 colonies were produced from the SCS110 cells, while 448 
colonies resulted from the DH5a cells. This is evidence that 
the Dpn I gene is being expressed on plasmid pEZC3101 
(FIG. 7C), and that it is killing the dam + DH5ct cells but not 
the dam" SCS110 cells. 

Part III: Demonstration of Recombinational Cloning Using 
Dpn I Selection 

A pair of plasmids was used to demonstrate recombina- 
tional cloning with selection for product dependent upon the 
toxic gene Dpn 1. Plasmid pEZC3101 (FIG. 7C) was 
linearized with Mlu I and reacted with circular plasmid 
pEZC2901 (FIG. 7A). A second pair of plasmids using 
selection based on control of drug resistance by a repressor 
gene was used as a control: plasmid pEZC1802 (FIG. 7D) 
was linearized with Xba I and reacted with circular plasmid 
pEZC1502 (FIG. 5H). Eight microliter reactions containing 
the same buffer and proteins Xis, Int, and IHF as in previous 
examples were incubated for 45 minutes at 25° C, then 10 
minutes at 75° C, and 1 /A aliquots were transformed into 
DH5a (i.e., dam+) competent cells, as presented in Table 6. 

TABLE 6 



Reac- Basis of 

tion # Vector donor selection Gene donor Colonies 



1 pEZC3101/Mlu Dpn I toxicity — 3 

2 pEZC3101/Mlu Dpn I toxicity Circular pEZC2901 4000 

3 pEZC1802/Xba Tet repressor — 0 

4 pEZC1802/Xba Tet repressor Circular pEZC1502 12100 



Miniprep DNAs were prepared from four colonies from 
reaction #2, and cut with restriction enzyme Ssp I. All gave 
the predicted fragments. 

Analysis: Subcloning using selection with a toxic gene 
was demonstrated. Plasmids of the predicted structure were 
produced. 

Example 6 

Cloning of Genes with Uracil DNA Glycosylase 
and Subcloning of the Genes with Recombinational 
Cloning to Make Fusion Proteins 
Part I: Converting an Existing Expression Vector to a Vector 
Donor for Recombinational Cloning 

A cassette useful for converting existing vectors into 
functional Vector Donors was made as follows. Plasmid 
pEZC3101 (FIG. 7C) was digested with Apa I and Kpn I, 
treated with T4 DNA polymerase and dNTPs to render the 
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ends blunt, further digested with Sma I, Hpa I, and AlwN I 
to render the undesirable DNA fragments small, and the 2.6 
kb cassette containing the atfRl-Cm R -Dpn I-attR-3 domains 
was gel purified. The concentration of the purified cassette 
was estimated to be about 75 ng DNA//A. 

Plasmid pGEX-2TK (FIG. 8A) (Pharmacia) allows 
fusions between the protein glutathione S transferase and 
any second coding sequence that can be inserted in its 
multiple cloning site. pGEX-2TK DNA was digested with 
Sma I and treated with alkaline phosphatase. About 75 ng of 
the above purified DNA cassette was ligated with about 100 
ng of the pGEX-2TK vector for 2.5 hours in a 5 ^1 ligation, 
then 1 /A was transformed into competent BRL3056 cells (a 
dam" derivative of DH10B; dam" strains commercially 
available include DM1 from Life Technologies, Inc., and 
SCS 110 from Stratagene). Aliquots of the transformation 
mixture were plated on LB agar containing 100 figlmX 
ampicillin (resistance gene present on pGEX-2TK) and 30 
/iglml chloramphenicol (resistance gene present on the DNA 
cassette). Colonies were picked and miniprep DNAs were 
made. The orientation of the cassette in pGEX-2TK was 
determined by diagnostic cuts with EcoR I. A plasmid with 
the desired orientation was named pEZCC3501 (FIG. 8B). 
Part II: Cloning Reporter Genes Into an Recombinational 
Cloning Gene Donor Plasmid in Three Reading Frames 

Uracil DNA glycosylase (UDG) cloning is a method for 
cloning PCR amplification products into cloning vectors 
(U.S. Pat. No. 5,334,515, entirely incorporated herein by 
reference). Briefly, PCR amplification of the desired DNA 
segment is performed with primers that contain uracil bases 
in place of thymidine bases in their 5' ends. When such PCR 
products are incubated with the enzyme UDG, the uracil 
bases are specifically removed. The loss of these bases 
weakens base pairing in the ends of the PCR product DNA, 
and when incubated at a suitable temperature (e.g., 37° C), 
the ends of such products are largely single stranded. If such 
incubations are done in the presence of linear cloning 
vectors containing protruding 3' tails that are complemen- 
tary to the 3' ends of the PCR products, base pairing 
efficiently anneals the PCR products to the cloning vector. 
When the annealed product is introduced into£. coli cells by 
transformation, in vivo processes efficiently convert it into a 
recombinant plasmid. 

UDG cloning vectors that enable cloning of any PCR 
product in all three reading frames were prepared from 
pEZC3201 (FIG. 8K) as follows. Eight oligonucleotides 
were obtained from Life Technologies, Inc. (all written 
5'^3': rfl top (GGCC GATTAC GAT ATC CCAACG ACC 
GAA AAC CTG TAT TTT CAG GCT) (SEQ. ID NO:19), 
rfl bottom (CAG GTT,RTC GGT CGT TGG GAT ATC GTA 
ATC)(SEQ. ID NO:20), rf2 top (GGCCA GAT TAC GAT 
ATC CCA ACG ACC GAA AAC CTG TAT TTT CAG 
GGT)(SEQ. ID NO:21), rf2 bottom (CAG GTT TTC GGT 
CCTTGG GAT ATC GTAATC T)(SEQ. ID NO:22), rf3 top 
(GGCCAA GAT TAC GAT ATC CCA ACG ACC GAA 
AAC CTG TAT TTT CAG GGT)(SEQ. ID NO:23), rf3 
bottom (CAGCGTT TTC GGT CGT TGG GAT ATC GTA 
ATC TT)(SEQ. ID NO:24), carboxy top (ACC GTT TAC 
GTC GAC)(SEQ. ID NO:25) and carboxy bottom (TCGA 
GTC CAC GTA AAC GGT TCC CAC TTA TTA)(SEQ. ID 
NO:26). The rfl, 2, and 3 top strands and the carboxy bottom 
strand were phosphorylated on their 5' ends with T4 poly- 
nucleotide kinase, and then the complementary strands of 
each pair were hybridized. Plasmid pEZC3201 (FIG. 8K) 
was cut with Not I and Sal I, and aliquots of cut plasmid 
were mixed with the carboxy-oligo duplex (Sal I end) and 
either the rfl, rf2, or rf3 duplexes (Not I ends) (10 fig cut 



plasmid (about 5 pmol) mixed with 250 pmol carboxy oligo 
duplex, split into three 20 (A volumes, added 5 [A (250 pmol) 
of rfl, rf2, or rf3 duplex and 2 ^1=2 units T4 DNA ligase to 
each reaction). After 90 minutes of ligation at room 
temperature, each reaction was applied to a preparative 
agarose gel and the 2.1 kb vector bands were eluted and 
dissolved in 50 [A of TE. 
Part III: PCR of CAT and phoA Genes 

' Primers were obtained from Life Technologies, Inc., to 
amplify the chloramphenicol acetyl transferase (CAT) gene 
from plasmid pACYC184, and phoA, the alkaline phos- 
phatase gene from E. coli. The primers had 12-base 5' 

. extensions containing uracil bases, so that treatment of PCR 
products with uracil DNA glycosylase (UDG) would 
weaken base pairing at each end of the DNAs and allow the 
3' strands to anneal with the protruding 3' ends of the rfl, 2, 
and 3 vectors described above. The sequences of the primers 

) (all written 5'-3') were: CAT left, UAU UUU CAG GGU 
ATG GAG AAAAAAATC ACT GGATATACC (SEQ. ID 
NO:27); CAT right, UCC CAC UUA UUA CGC CCC GCC 
CTG CCA CTC ATC (SEQ. ID NO:28); phoA left, UAU 
UUU CAG GGU ATG CCT GTT CTG GAA AAC CGG 

5 (SEQ. ID NO:29); and phoA right, UCC CAC UUA UUA 
TTT CAG CCC CAG GGC GGC TTT C (SEQ. ID NO:30). 
The primers were then used for PCR reactions using known 
method steps (see, e.g., U.S. Pat. No. 5,334,515, entirely 

} incorporated herein by reference), and the polymerase chain 
reaction amplification products obtained with these primers 
comprised the CAT or phoA genes with the initiating ATGs 
but without any transcriptional signals. In addition, the 
uracil-containing sequences on the amino termini encoded 

> the cleavage site for TEV protease (Life Technologies, Inc.), 
and those on the carboxy terminal encoded consecutive TAA 
nonsense codons. 

Unpurified PCR products (about 30 ng) were mixed with 
the gel purified, linear rfl, rf2, or rf3 cloning vectors (about 

} 50 ng) in a 10 fil reaction containing lx REact 4 buffer (LIT) 
and 1 unit UDG (LTI). After 30 minutes at 37° C, 1 /A 
aliquots of each reaction were transformed into competent E. 
coli DH5a cells (LTI) and plated on agar containing 50 

. fig/ml kanamycin. Colonies were picked and analysis of 
miniprep DNA showed that the CAT gene had been cloned 
in reading frame 1 (pEZC3601)(FIG. 9C), reading frame 2 
(pEZC3609)(FIG. 8D) and reading frame 3 (pEZC3617) 
(FIG. 8E), and that the phoA gene had been cloned in 

3 reading frame 1 (pEZC3606)(FIG. 8F), reading frame 2 
(pEZC3613)(FIG. 8G) and reading frame 3 (pEZC3621) 
(FIG. 8H). 

Part IV: Subhloning of CAT or phoA from UDG Cloning 
Vectors into a GST Fusion Vector 

Plasmids encoding fusions between GST and either CAT 
or phoA in all three reading frames were constructed by 
recombinational cloning as follows. Miniprep DNA of GST 
vector donor pEZC3501(FIG. 8B) (derived from Pharmacia 

D plasmid pGEX-2TK as described above) was linearized with 
Cla I. About 5 ng of vector donor were mixed with about 10 
ng each of the appropriate circular gene donor vectors 
containing CAT or phoA in 8 (A reactions containing buffer 
and recombination proteins Int, Xis, and IHF (above). After 

5 incubation, 1 (A of each reaction was transformed into E. coli 
strain DH5a and plated on ampicillin, as presented in Table 
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TABLE 7 



Colonies (10% of 
DNA each transformation) 



Linear vector donor (pEZC3501/Cla) 0 

Vector donor + CAT rfl 110 

Vector donor + CAT rf2 71 

Vector donor + CAT rf3 148 

Vector donor + phoA rfl 121 

Vector donor + phoA rf2 128 

Vector donor + phoA rf3 31 



Part V: Expression of Fusion Proteins 

Two colonies from each transformation were picked into 
2 ml of rich medium (CircleGrow, BiolOl Inc.) in 17x100 
mm plastic tubes (Falcon 2059, Becton Dickinson) contain- 
ing 100 jug/ml ampicillin and shaken vigorously for about 4 
hours at 37° C, at which time the cultures were visibly 
turbid. One ml of each culture was transferred to a new tube 
containing 10 fi\ of 10% (w/v) IPTG to induce expression of 
GST. After 2 hours additional incubation, all cultures had 
about the same turbidity; the A600 of one culture was 1.5. 
Cells from 0.35 ml each culture were harvested and treated 
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with sample buffer (containing SDS and p-mercaptoethanol) 
and aliquots equivalent to about 0.15 A600 units of cells 
were applied to a Novex 4-20% gradient polyacrylamide 
gel. Following electrophoresis the gel was stained with 

5 Coomassie blue. 

Results: Enhanced expression of single protein bands was 
seen for all 12 cultures. The observed sizes of these proteins 
correlated well with the sizes predicted for GST being fused 
(through attB recombination sites without stop codons) to 

10 CAT or phoA in three reading frames: CAT rfl =269 amino 
acids; CATrf2=303 amino acids; CAT rf3=478 amino acids; 
phoArfl=282 amino acids; phoArf 2=280 amino acids; and 
phoA rf3=705 amino acids. 

Analysis: Both CAT and phoA genes were subcloned into 

15 a GST fusion vector in all three reading frames, and expres- 
sion of the six fusion proteins was demonstrated. 

While the foregoing invention has been described in some 
detail for purposes of clarity and understanding, it will be 
appreciated by one skilled in the art from a reading of this 

20 disclosure that various changes in form and detail can be 
made without departing from the true scope of the invention 
and appended claims. All patents and publications cited 
herein are entirely incorporated herein by reference. 



SEQUENCE LISTING 



( 1 ) GENERAL INFORMATION: 

( i i i ) NUMBER OF SEQUENCES: 35 

( 2 ) INFORMATION FOR SEQ ID NO:l: 

( i ) SEQUENCE CHARACTERISTICS: 

( B ) TYPE: nucleic acid^ 
( C ) STRANDEDNESS: both 
( D ) TOPOLOGY: both 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

RKYCWGCTTT YKTRTACNAA STSGB 



( 2 ) INFORMATION FC 



i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 
(D)TC 



i ) MOLECULE TYPE: cDNA 

i ) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

CTTT YKTRTACNAA CTSGB 



( 2 ) INFORMATION FOR SEQ ID NO:3: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 




( i i ) MOLECULE TYPE: cDNA 



( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
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GTTCAGCTTT CKTRTA 

( 2 ) INFORMATION FOR SEQ ID NC 

( i ) SEQUENCE 

( B ) TYPE: nucleic acid 

( C ) STRANDEDNESS: both 

( D ) TOPOLOGY: bolh 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

AGCCWGCTTT CKTRTACNAA GTSGB 

( 2 ) INFORMATION FOR SEQ ID NO:S: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: bolh 
( D ) TOPOLOGY: bolh 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

GTTCAGCTTT YKTRTACNAA GTSGB 



FOR SEQ ID NO:6: 

i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pahs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: bolh 
( D ) TOPOLOGY: bolh 

i ) MOLECULE TYPE: cDNA 

i ) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

CTTT TTTGTACAAA CTTGT 

FOR SEQ ID NO:7: 



i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 
( D ) TOPOLOGY: both 



( x i ) SEQUENCE DESCRIPTION: SEQ ID 
CCTGCTTT CTTGTACAAA CT 

) INFORMATION FOR SEQ ID NO:8: 

( i ) SEQUENCE CHARACTERISTICS: 



E: cDNA 

) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
TTT CTTGTACAAA CTTGT 



( 2 ) INFORMATION FOR SEQ ID NO:9: 



( C ) STRANDEDNESS: both 



i ) SEQUENCE DESCRIPTION: SEQ ID 
CTTT TTTGTACAAA CT 
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( 2 ) INFORMATION FOR SEQ ID NO:10: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 

( D ) TOPOLOGY: bolh 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:10: 

GTTCAGCTTT CTTGTACAAA CTTGT 



i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 
( D ) TOPOLOGY: bolh 

i ) MOLECULE TYPE: cDNA 

i ) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

CTTT CTTGTACAAA GTTGG 



1EQ ID NO:12: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: bolh 
( D ) TOPOLOGY: bolh 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:12: 

AGCCTGCTTT TTTGTACAAA GTTGG 



( 2 ) INFORMATION FOR SEQ ID 



i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 

(D) - 



i ) MOLECULE TYPE: cDNA 

i ) SEQUENCE DESCRIPTION: SEQ ID NO:13: 

CTTT CTTGTACAAA GTTGG 



) INFORMATION FOR SEQ ID NO:14: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 
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i ) MOLECULE TYPE: cDNA 



ACCCAGCTTT CTTGTACAAA GTTGG 

( 2 ) INFORMATION FOR SEQ ID NO:15: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 
( D ) TOPOLOGY: bolh 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:15: 



( 2 ) INFORMATION FOR SEQ ID NO:16: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: bolh 
( D ) TOPOLOGY: bolh 



GTTCAGCTTT CTTGTACAA. 



( 2 ) INFORMATION FOR SEQ ID NO:17: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 39 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 
( D ) TOPOLOGY: both 

( i i ) MOLECULE TYPE: cDNA 



( i ) SEQUENCE CHARACTERISTICS: 



i ) MOLECULE TYPE: cDNA 

i ) SEQUENCE DESCRIPTION: SEQ ID NO:18: 



) INFORMATION FC 



i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 46 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 
( D ) TOPOLOGY: both 

i ) MOLECULE TYPE: cDNA 



(<i) SEQUENCE DESCRIPTION: SEQ ID NO:19: 
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GGCCGATTAC GATATCCCAA CGACCGAAAA CCTGTATTTT CAGGGT 

( 2 ) INFORMATION FOR SEQ ID NO:20: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 30 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 
( D ) TOPOLOGY: both 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

CAGGTTTTCG GTCGTTGGGA TATCGTAATC 

( 2 ) INFORMATION FOR SEQ ID NO.-21: 

( i ) SEQUENCE CHARACTERISTICS: 

( B ) TYPE: nucleic acid 

( C ) STRANDEDNESS: both 



( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
GCCAGATTA CGATATCCCA ACGACCGAAA ACCTGTATTT TCAGGGT 



( 2 ) INFORMATION FOR SEQ ID NO:22: 



( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 31 base pairs 
( B ) TYPE: nucleic acid 

( D ) TOPOLOGY: both 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
CAGGTTTTCG GTCGTTGGGA TATCGTAATC T 

( 2 ) INFORMATION FOR SEQ ID NO:23: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 48 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 
( D ) TOPOLOGY: both 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DI 

GGCCAAGATT ACGA1 

( 2 ) INFORMATION FOR SEQ IC 



( C ) STRANDEDNESS: b> 



i ) MOLECULE TYPE: cDNA 

i ) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

TTCG GTCGTTGGGA TATCOTf 
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( i i ) MOLECULE TYPE: cDNA 
( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
ACCGTTTACG TGGAC 

( 2 ) INFORMATION FOR SEQ ID NO:26: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 31 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: bolh 
( D ) TOPOLOGY: bolh 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

TCGAGTCCAC GTAAACGGTT CCCACTTATT 

( 2 ) INFORMATION FOR SEQ ID NO:27: 



( i i ) MOLECULE TYPE: cDNA 
( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
UAUUUUCAGG GUATGGAGAA AAAAATCACT GGATATACC 

( 2 ) INFORMATION FOR SEQ ID NO:28: 

ICE CHARACTERIS 
) LENGTH: 33 bas 

: ) STRANDEDNESS: b( 



( i i ) MOLECULE TYPE: cDNA 
( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
UCCCACUUAU UACGCCCCGC CCTGCC/ 

( 2 ) INFORMATION FOR SEQ ID NO:29: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 33 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: bolh 
( D ) TOPOLOGY: bolh 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:29: 

UAUUUUCAGG GUATGCCTGT TCTGGA/ 



( 2 ) INFORMATION FOR SEQ ID NO:30: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 34 base pairs 
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( i i ) MOLECULE TYPE: cDNA 
( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:30: 
UCCCACUUAU UATTTCAGCC CCAGGGCGGC TTTC 

( 2 ) INFORMATION FOR SEO ID NO:31: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( E ) TYPE: nucleic acid 
( C ) STRANDEDNESS: both 
( D ) TOPOLOGY: both 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:31: 

AGCCTGCTTT TTTATACTAA CTTGA 



( 2 ) INFORMATION FOR SEQ ID NO:32: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: bolh 
( D ) TOPOLOGY: bolh 

( i i ) MOLECULE TYPE: cDNA 



TCAAGTTAGT ATA. 



AGC AGGCT 



( 2 ) INFORMATION FOR SEQ ID 



( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:33: 



( 2 ) INFORMATION FOR SEQ ID 



( D ) TOPOLOGY: both 
( i i ) MOLECULE TYPE: cDNA 
( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
AGTTTGT ACAAGAAAGC AGGCT 



FOR SEQ ID NO:35: 
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-continued 



( i j ) MOLECULE TYPE: cDNA 
( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:3S: 
ACTTTGT ACAAGAAAGC TGGGT 



What is claimed is: 

1. A Vector Donor DNA molecule comprising a first DNA io 
segment and a second DNA segment, said first or second 
DNA segment containing at least one Selectable marker, 

wherein 

i) said first or second DNA segment is flanked by at 
least a first and a second recombination site; and 15 
ii) said first recombination site and said second recombi- 
nation site do not recombine with each other. 

2. The Vector Donor DNA molecule according to claim 1, 
wherein said Selectable marker comprises at least one 
inactive fragment of the Selectable marker, wherein the 2 n 
inactive fragment reconstitutes a functional Selectable 
marker when recombined across said first or second recom- 
bination site with a futher DNA segment comprising another 
inactive fragment of the Selectable marker. 

3. The Vector Donor DNA molecule of claim 1, wherein 25 
at least one of said recombination sites is derived from at 
least one recombination site selected from the group con- 
sisting of attB, attP, attL, and attR. 

4. The Vector Donor DNA molecule according to claim 1, 
wherein the Selectable marker comprises at least one DNA 30 
segment selected from the group consisting of: 

(i) a DNA segment that encodes a product that provides 
resistance against otherwise toxic compounds; 

(ii) a DNA segment that encodes a heteolosous product; 

(iii) a DNA segment that encodes a product that sup- 35 
presses the activity of a gene product; 

(iv) a DNA segment that encodes a product that is 
identifiable; 

(v) a DNA segment tat encodes a product that inhibits a 
cell function; 40 

(vi) a DNA segment that inhibits the activity of any of the 
DNA segments of (i)-( v ) above; 

(vii) a DNA segment that binds a product that modifies a 
substrate; 45 

(viii) a DNA segment that provides for the isolation of a 
desired molecule; 

(ix) a DNA segment that encodes a specific nucleotide 
recognition sequence which is recognized by an 
enzyme; and 50 

(x) a DNA segment that, when deleted, confers sensitivity 
to cell-killing by particular compounds. 

5. The Vector Donor DNA molecule according to claim 4, 
wherein said Selectable marker comprises at least one 
marker selected from the group consisting of an antibiotic 55 
resistance gene, a tRNAgene, an auxotrophic marker, a toxic 
gene, a phenotypic marker, an antisense oligonucleotide; a 
restriction endonuclease; a restriction endonuclease cleav- 
age site, an enzyme cleavage site, a protein binding site; and 

a sequence complementary to a PCR primer sequence. 60 

6. The Vector Donor DNA molecule according to claim 1, 
wherein said recombination site comprises a DNA sequence 
selected from the group consisting of: 

(a) RKYCWGCTTTYKTRTACNAASTSGB (m-att) 
(SEQ ID NO:l); 65 

(b) AGCCWGCTTTYKTRTACNAACTSGB (m-attB) 
(SEQ ID NO: 2); 



(c) G TTCAG CTTTCKTRTACNAACTS G B (m-attR) 
(SEQ ID NO:3); 

(d) AG CCWG CTTTCKTRTACN AAGTS GB (m-attL) 
(SEQ ID NO:4); 

(e) GTTCAGCTTTYKTRTACNAAGTSGB(m-attPl) 
(SEQ ID NO:5); 

and a corresponding or complementary DNA or RNA 
sequence, wherein R=A or G; K=G or TAJ; Y=C or TAJ; 
W=A or TAJ; N=A, C, or G or T/U; S=C or G; and B=C, G 
or TAJ. 

7. The Vector Donor DNA molecule according to claim 6, 
wherein said DNA sequence comprises a sequence selected 
from the group consisting of: 

(a) AG C CTG CTTTTTTGTAC AA ACTTGT (attBl) 
(SEQ ID NO:6); 

(b) AGCCTGCTTTCTTGTACAAACTTGT (attB2) 
(SEQ ID NO:7); 

(c) ACCCAGCTTTCTTGTACAAACTTGT (attB3) 
(SEQ ID NO:8); 

(d) GTTCAGCTTTTTTGTACAAACTTGT (attRl) 
(SEQ ID NO:9); 

(e) GTTC AG CTTTCTTGTACAA ACTTGT (attR2) 
(SEQ ID NO:10); 

(f) G TTCAG CTTTCTTGTACAAAGTTGG (attR3) 
(SEQ ID NO: 11); 

(g) AGCCTGCTTTTTTGTACAAAGTTGG (attLl) 
(SEQ ID NO: 12); 

(h) AGCCTGCTTTCTTGTACAAAGTTGG (attL2) 
(SEQ ID NO:13); 

(i) ACCCAGCTTTCTTGTACAAAGTTGG (attL3) 
(SEQ ID NO: 14); 

(j) GTTCAGCTTTTTTGTACAAAGTTGG(attPl) (SEQ 
ID NO:15); 

(k) GTTC AG CTTTCTTGTAC AAAGTTG G (attP2,P3) 
(SEQ ID NO: 16); 
and a corresponding or complementary DNA or RNA 
sequence. 

8. An Insert Donor DNA molecule, comprising a first 
DNA segment flanked by at least a first recombination site 
and a second recombination site, wherein said first and 
second recombination sites do not recombine with each 
other. 

9. The Insert Donor DNA molecule according to claim 8, 
wherein said desired DNA segment codes for at least one 
marker selected from the group consisting of a cloning site, 
a restriction site, a promoter, an operon, an origin of 
replication, a functional DNA, an antisense RNA, a PCR 
fragment, a protein and a protein fragment. 

10. The Insert Donor DNA molecule according to claim 8, 
wherein said recombination site comprises a DNA sequence 
selected from the group consisting of: 

(a) RKYCWGCTTTYKTRTACNAASTSGB (m-att) 
(SEQ ID NO:l); 

(b) AGCCWGCTTTYKTRTACNAACTSGB (m-attB) 
(SEQ ID NO:2); 

(c) GTTCAGCTTTCKTRTACNAACTSGB (m-attR) 
(SEQ ID NO:3); 
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(d) AG CC WG CTTTCKTRTACNAAGTSGB (m-attL) 15. The kit according to claim 14, wherein said DNA 

(SEQ ID N0'4)' sequence comprises a sequence selected from the group 

W^™^^^^ 0 ^* 1 ) lafAGCCTGCTTTTTTGTACAAACTTGTCattBl) 

and a corresponding or complementary DNA or RNA 5 (SEQ ID NO:6); , 

sequence, wherein R=A or G; K=G or T/U; Y=C or T/U; (b) AGCCTGCTTTCTTGTACAAACTTGT (attB2) 

W=A or T/U; N=A C, or G or TAJ; S=C or G; and B=C, G (SEQ ID NO./), 

orT/u (c) ACCCAGCTTTCTTGTACAAACTTGT (attB3) 

11. The Insert Donor DNA molecule according to claim (SEQ ID NO:8); 

10, wherein said DNA sequence comprises a sequence "> (d) GTTCAGCTTTTTTGTACAAACTTGT (attRl) 

selected from the group consisting of: (SEQ ID NO:9); 

(a) AGCCTGCTTTTTTGTACAAACTTGT (attBl) (e) GTTCAGCTTTCTTGTACAAACTTGT (attR2) 
(SEQ ID NO:6); (SEQ ID NO: 10); 

(b) AGCCTGCTTTCTTGTACAAACTTGT (attB2) „ (f) GTTCAGCTTTCTTGTACAAAGTTGG (attR3) 
(SEO ID N0 7)- v ^ NO:ll); 

(c) ACCCAGCTTTCTTGTACAAACTTGT (attB3) (g) AGCCTGCTTTTTTGTACAAAGTTGG (attLl) 
rSFO ID NO-RY (5>ty IJJ NU.1Z), 

(d) ( GTTCAGCTTTTTTGTACAAACTTGT (attRl) (h^AGCCTGCTTTCTTGTACAAAGTTGG (attL2) 
(SEQ ID N0:9); 20 ACCCAGCTTTCTTGTACAAAGTTGG (attL3) 

(e) GTTCAGCTTTCTTGTACAAACTTGT (attR2) k ' (SEQ ID N0: 14); 

(SEQ ID NO: 10); Q) GTTCAGCTTTTTTGTACAAAGTTGG(attPl) (SEQ 

(f) GTTCAGCTTTCTTGTACAAAGTTGG (attR3) ID NO:15); 

(SEQ ID NO:ll); 2j (k) GTTCAGCTTTCTTGTACAAAGTTGG (antP2,P3) 

(g) AGCCTGCTTTTTTGTACAAAGTTGG (attLl) (SEQ ID NO: 16); 

(SEQ ID NO: 12); and a corresponding or complementary DNA or RNA 

(h) AGCCTGCTTTCTTGTACAAAGTTGG (attL2) sequence. 

(SEQ ID NO: 13); 16. A recombinant nucleic acid molecule, comprising at 

(i) ACCCAGCTTTCTTGTACAAAGTTGG (attL3) 30 least one DNA segment comprising at least a first and a 
(SEQ ID NO- 14)- second recombination site flanking a Selectable marker or at 

(j) GTrCAGCTTTTTTGTACAAAGTTGG(attPl) (SEQ 5f ast <»* des j red DNA ^S™" 1 ' wherein at least one of said 

ID NO'15)- st an s second recombination sites comprises a core 

<\r\ p TT o/^ rmmrT4r . A4rrrrr /- a ttP-7P^ region that enhances recombination efficiency or specificity 

( S (a«P2,P3) ^ in vitro in the formation of a Co integrate DNA or a Product 

(SEQ ID NO 16), DNA, and wherein said first and second sites do not recom- 

and a corresponding or complementary DNA or RNA bine ^ each ^ 

sequence 17. A composition, comprising the recombinant nucleic 

12. A tat composing at least one Vector Donor DNA add molecul P according to P clairr f 16 , and a carrier . 
molecule comprising at least a first DNA segment and a recombinant nucleic acid molecule according to 
second DNA segment, said first or second DNA segment wherein gaid recombination sites confer at least 
containing at least one Selectable marker, wherein said first ^ enhancement selected from the consisting of (i) 
or second DNA segment is flanked by at least a firs and a enhand exdsive recombination; (ii) enhancing integra- 
second recombination site, that do not recombine with each ^ reco * bination . decreasi ng the requirement for host 

^' ™ ,. , , ■ „ c ■■ . as factors; (iv) increasing the efficiency of the formation reac- 

13. The kit accordmg to claim 12 further comprising a ^ re£ / ombination S of said Cointegrate DNA or of said 
kast one Insert Donor DNA molecule comprising a desired increasing the specificity of the formation 
DNA segment flanked by at least a first recombination site reaction rec ^ bina(ion of said Cointegrate DNA or of 
and a second recombinatxon site that do not recombme w,th said DNA . and (yJ) increasing f he specificity or 

ea , °* er \ . ,. ,• n u • ■, 50 vield of a subsequent recombination reaction of, or subse- 

14. The tat according to claim 12, wherein said recom- » 4 ^ ^ 

bmation site comprises a DNA sequence selected from the ^ ^ recombinant nucleic acid molecule acC ording to 

group consisting of: claim 18 wherein said att site is at least one selected from 

(a) RKYCWGCTTTYKTRTACNAASTSGB (m-att) ^ group consisting of attl> att2 and att3 . 

(SEQ ID NO: 1), 55 20. The recombinant nucleic acid molecule according to 

(b) AG CCW G CTTTYKTRTACNAACTS G B (m-attB) c laim 16, wherein said at least one of said recombination 
(SEQ ID NO:2); sites is from at least one att recombination site. 

(c) GTTCAGCTTTCKTRTACNAACTSGB (m-attR) 21. The recombinant nucleic acid molecule according to 
(SEQ ID NO:3); claim 20, wherein the att site is at least one selected from the 

(d) AGCCWGCTTTCKTRTACNAAGTSGB (m-attL) 60 group consisting of attB, attP, attL and attR. 

(SEQ ID NO:4); 22. The recombinant nucleic acid according to claim 16, 

(e) GTTCAGCTTTYKTRTACNAAGTSGB(m-attPl) wherein said core region comprises a DNA sequence 
(SEQ ID NO-5Y selected from the group consisting of: 

and a corresponding or complementary DNA or RNA (a) RKYCWGCTTTYKTRTACNAASTSGB(m-att) 

sequence, wherein R=A or G; K=G or T/U; Y=C or T/U; 65 (SEQ ID NO:l); 

W=A or T/U; N=A, C, or G or T/U; S=C or G; and B=C, G (b) AGCCWGCTTTYKTRTACNAACTSGB (m-attB) 

or T/U. (SEQ ID NO:2); 
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(c) GTTCAGCTTTCKTRTACNAACTSGB (m-attR) 
(SEQ ID N0:3); 

(d) AGCCWGCTTTCKTRTACNAAGTSGB (m-attL) 
(SEQ ID N0:4); 

(e) GTTCAGCTTTYKTRTACNAAGTSGB(m-attPl) 
(SEQ ID N0:5); 

and a corresponding or complementary DNA or RNA 
sequence, wherein R=A or G; K=G or TAJ; Y=C or TAJ; 
W=A or TAJ; N=A or C or G or TAJ; S=Cor G; and B=C or 
G or T/U. 

23. The recombinant nucleic acid according to claim 22, 
wherein said core region comprises a DNA sequence 
selected from the group consisting of: 

(a) AGCCTGCTTTTTTGTACAAACTTGT (attBl) 
(SEQ ID NO:6); 

(b) AGCCTGCTTTCTTGTACAAACTTGT (attB2) 
(SEQ ID NO:7); 

(c) ACCCAGCTTTCTTGTACAAACTTGT (attB3) 
(SEQ ID NO:8); 

(d) GTTCAGCTTTTTTGTACAAACTTGT (attRl) 
(SEQ ID NO:9); 

(e) GTTCAGCTTTCTTGTACAAACTTGT (attR2) 
(SEQ ID NO: 10); 

(f) GTTCAGCTTTCTTGTACAAAGTTGG (attR3) 
(SEQ ID NO: 11); 

(g) AGCCTG CTTTTTTGTACAAAGTTGG (attLl) 
(SEQ ID NO: 12); 

(h) AGCCTGCTTTCTTGTACAAAGTTGG (attL2) 
(SEQ ID NO:13); 

(i) ACCCAGCTTTCTTGTACAAAGTTGG (attL3) 
(SEQ ID NO: 14); 

(j) GTTCAGCTTTTTTGTACAAAGTTGG(attPl) (SEQ 
ID NO:15); 

(k) GTTCAGCTTTCTTGTACAAAGTTGG (attP2,P3) 
(SEQ ID NO: 16); 
and a corresponding or complementary DNA or RNA 
sequence. 

24. A kit, comprising the recombinant nucleic acid 
according to claim 16. 

25. The kit according to claim 24, further comprising at 
least one recombination protein that recognizes at least one 
of said recombination sites. 

26. A recombinant nucleic acid molecule, comprising 

at least one recombination site comprising at least one 
nucleic acid sequence having at least one of SEQ ID 
NOS:l-16, or a complementay DNA sequence or a 
corresponding RNA sequence. 

27. The method according to claim 26, wherein said 
nucleic acid sequence is selected from the group consisting 
of: 

(a) RKYCWGCTTTYKTRTACNAASTSGB (m-att) 
(SEQ ID NO:l); 

(b) AGCCWGCTTTYKTRTACNAACTSGB (m-attB) 
(SEQ ID NO:2); 

(c) GTTCAGCTTTCKTRTACNAACTSGB (m-attR) 
(SEQ ID NO:3); 

(d) AGCCWGCTTTCKTRTACNAAGTSGB (m-attL) 
(SEQ ID NO:4); 

(e) GTTCAGCTTTYKTRTACNAAGTSGB(m-attPl) 
(SEQ ID NO: 5); 

and a corresponding or complementary DNA or RNA 
sequence, wherein R=A or G; K=G or TAJ; Y=C or TAJ; 
W=A or TAJ; N=A, C, G or TAJ; S=C or G; and B=C, G or 
TAJ. 
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28. The method according to claim 27, wherein said 
nucleic acid sequence is selected from the group consisting 
of: 

(a) AGCCTGCTTTTTTGTACAAACTTGT (attBl) 
5 (SEQ ID NO:6); 

(b) AGCCTGCTTTCTTGTACAAACTTGT (attB2) 
(SEQ ID NO:7); 

(c) ACCCAGCTTTCTTGTACAAACTTGT (attB3) 
(SEQ ID NO:8); 

10 (d) GTTCAGCTTTTTTGTACAAACTTGT (attRl) 
(SEQ ID NO:9); 

(e) GTTCAGCTTTCTTGTACAAACTTGT (attR2) 
(SEQ ID NO: 10); 

(f) GTTCAGCTTTCTTGTACAAAGTTGG (attR3) 
is (SEQ ID NO:ll); 

(g) AGCCTGCTTTTTTGTACAAAGTTGG (attLl) 
(SEQ ID NO: 12); 

(h) AGCCTGCTTTCTTGTACAAAGTTGG (attL2) 
(SEQ ID NO: 13); 

20 (i) ACCCAGCTTTCTTGTACAAAGTTGG (attL3) 
(SEQ ID NO: 14); 
G) GTTCAGCTTTTTTGTACAAAGTTGG(attPl) (SEQ 
ID NO:15); 

25 (k) GTTCAGCTTTCTTGTACAAAGTTGG (attP2,P3) 
(SEQ ID NO: 16); 
and a corresponding or complementary DNA or RNA 
sequence. 

29. A method of making a Cointegrate DNA molecule, 
30 comprising combining in vitro. 

(i) an Insert Donor DNA molecule, comprising a desired 
DNA segment flanked by a first recombination si'.c and 
a second recombination site, wherein the first and 
second recombination sites do not recombine with each 

3S other; 

(ii) a Vector Donor DNA molecule containing a third 
recombination site and a fourth recombination site, 
wherein the third and fourth recombination sites do not 
recombine with each other; and 
40 (iii) at least one site specific recombination protein 
capable of recombining said first and third recombina- 
tional sites said second and fourth recombinational 
sites; 

thereby allowing recombination to occur, so as to produce a 
45 Cointegrate DNA molecule comprising said first and third or 
said second and fourth recombination sites. 

30.. The method as claimed in claim 29, wherein the 
Vector Donor DNA molecule comprises a vector segment 
flanked by said third and said fourth recombination sites, 
so 31. The method as claimed in claim 29, wherein the 
Vector Donor DNA molecule further comprises (a) a toxic 
gene and (b) a Selectable marker, wherein said toxic gene 
and said Selectable marker are on different DNA segments, 
the DNA segments being separated from each other by at 
55 least two recombination sites. 

32. The method as claimed in claim 29, wherein the 
Vector Donor DNA molecule further comprises (a) a repres- 
sion cassette and (b) a Selectable marker that is repressed by 
the repressor encoded by said repression cassette, and 

60 wherein the Selectable marker and the repression cassette 
are on different DNA segments, the DNA segments being 
separated from each other by at least two recombination 
sites. 

33. The method as claimed in claim 29, wherein at least 
65 one of said Insert Donor DNA molecule and said Vector 

Donor DNA molecule is comprised of a circular DNA 
molecule. 
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34. The method as claimed in claim 29, wherein at least 
one of said Insert Donor DNA molecule and said Vector 
Donor DNA molecule is comprised of a linear DNA mol- 

35. The method of claim 29, further comprising the step 

of 

producing a Product DNA molecule from said Cointegrate 
DNA by recombining at least one of (i) said first and 
third, or (ii) said second and fourth, recombination 
sites, said Product DNA comprising said desired DNA 
segment. 

36. The method according to claim 35, wherein said 
method also produces a Byproduct DNA molecule, wherein 
said Byproduct DNA molecule does not contain said desired 
DNA segment and is produced with said Product DNA. 

37. The method according to claim 35, further comprising 
the step of selecting the Product DNA molecule. 

38. A recombinant nucleic acid molecule comprising at 
least a first and a second recombination site flanking at least 
one DNAsegment containing at least one Selectable marker, 
wherein said first and second recombination sites do not 
recombine with each other. 

39. The recombinant nucleic acid molecule of claim 38, 
wherein said selectable marker is selected from the group 
consisting of: 

(i) a DNA segment that encodes a product that provides 
resistance against otherwise toxic compounds; 

(ii) a DNAsegment that encodes a heterologous product; 

(iii) a DNA segment that encodes a product that sup- 
presses the activity of a gene product; 

(iv) a DNA segment that encodes a product that is 
identifiable; 

(v) a DNA segment that encodes a product that inhibits a 
cell function; 

(vi) a DNAsegment that inhibits the activity of any of the 
DNA segments of (i) to (v) above; 

(vii) a DNA segment that binds a product that modifies a 
substrate; 

(viii) a DNA segment that provides for isolation of a 
desired molecule; 

(ix) a DNA segment that encodes a specific nucleotide 
recognition sequence which is recognized by an 
enzyme; and 

(x) a DNAsegment that, when deleted, confers sensitivity 
to cell killing by a particular compound. 

40. The recombinant nucleic acid molecule of claim 38, 
wherein said selectable marker is selected from the group 
consisting of an antibiotic resistance gene, a tRNA gene, an 
auxotrophic marker, a toxic gene, a phenotypic marker, an 
antisense oligonucleotide, a restriction endonuclease, a 
restriction endonuclease cleavage site, an enzyme cleavage 
site, a protein binding site, and a sequence complimentary to 
a PCR primer sequence. 

41. A kit comprising the recombinant nucleic acid mol- 
ecule of claim 38. 

42. The recombinant nucleic acid according to claim 38, 
wherein said DNA segment comprises a cloning site. 
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43. The recombinant nucleic acid according to claim 42, 
wherein said nucleic acid contains at least one restriction 
enzyme site at said cloning site. 

44. The recombinant nucleic acid according to claim 38, 
5 wherein said DNA segment further comprises an insert DNA 

molecule. 

45. The recombinant nucleic acid according to claim 44, 
wherein said Insert DNA molecule codes for at least one 
marker selected from the group consisting of a restriction 

10 site, a promoter, an operon, an origin of replication, a 
functional DNA, an antisense RNA, a PCR fragment, a 
protein or a protein fragment. 

46. The molecule according to claim 38, wherein said 
recombination site comprises a DNA sequence selected from 

15 the group consisting of: 

(a) RKYCWGCTTTYKTRTACNAASTSGB (m-att) 
(SEQ ID NO:l); 

(b) AGCCWGCTTTYKTRTACNAACTSGB (m-attB) 
(SEQ ID NO:2); 

20 (c) GTTCAGCTTTCKTRTACNAACTSGB (m-attR) 
(SEQ ID NO:3); 

(d) AGCCWGCTTTCKTRTACNAAGTSGB (m-attL) 
(SEQ ID NO:4); 

(e) GTTCAGCTTTYKTRTACNAAGTSGB(m-attPl) 
(SEQ ID NO:5); 

and a corresponding or complementary DNA or RNA 
sequence, wherein R=A or G; K=G or TAJ; Y=C or T/U; 
W=A or TAJ; N=A, C, G or TAJ; S=C or G; and B=C, G or 

30 T/U ' 

47. The molecule according to claim 46, wherein said 
DNA sequence comprises a sequence selected from the 
group consisting of: 

(a) AGCCTGCTTTTTTGTACAAACTTGT (attBl) 
35 (SEQ ID NO:6); 

(b) AGCCTGCTTTCTTGTACAAACTTGT (attB2) 
(SEQ ID NO:7); 

(c) ACCCAGCTTTCTTGTACAAACTTGT (attB3) 
(SEQ ID NO:8); 

40 (d) GTTC AG CTTTTTTGTAC AAACTTGT (attRl) 
(SEQ ID NO:9); 

(e) G TTC AG CTTTCTTGTAC AAACTTGT (attR2) 
(SEQ ID NO: 10); 

(f) GTTCAGCTTTCTTGTACAAAGTTGG (attR3) 
« (SEQ ID NO: 11); 

(g) AGCCTGCTTTTTTGTACAAAGTTGG (attLl) 
(SEQ ID NO: 12); 

(h) AGCCTGCTTTCTTGTACAAAGTTGG (attL2) 
(SEQ ID NO:13); 

50 (i) ACCCAGCTTTCTTGTACAAAGTTGG (attL3) 
(SEQ ID NO:14); 
(j) GTTCAGCTTTTTTGTACAAAGTTGG(attPl) (SEQ 
ID NO:15); 

55 (k) GTTCAGCTTTCTTGTACAAAGTTGG (attP2,P3) 
(SEQ ID NO: 16); 
and a corresponding or complementary DNA or RNA 



UNITED . . TATES PATENT AND TRADEMAL OFFICE 

CERTIFICATE OF CORRECTION 



PATENT NO. : 5,888,732 Page 1 of 2 

DATED : March 30, 1999 

INVENTOR(S) : Hartley etal. 



It is certified that error appears in the above-identified patent and that said Letters Patent is 
Column 46 , 

Line 19, please delete "claim 6" and insert therefor - claim 1 — . 

Line 20, please delete "DNA sequence" and insert therefor ~ recombination site -. 

Column 47 , 

Line 10, please delete "claim 10" and insert therefor — claim 8 ~. 

Line 10, please delete "DNA sequence" and insert therefor - recombination site --. 

Column 48 , 

Line 1, please delete "claim 14" and insert therefor - claim 12 -. 

Lines 1 -2, please delete "DNA sequence" and insert therefor — recombination 

site 

Line 24, please delete "antP2,P3" and insert therefor - attP2,P3 --. 
Line 49, please delete "yield" and insert therefor — yield -. 
Line 55, after "wherein" and before "at", please delete "said". 
Line 58, after "site is" please delete "at least one". 

Column 49 , 

Line 11, please delete "claim 22" and insert therefor — claim 16 
Line 48, please delete "complementay" and insert therefor - complementary --. 
Line 50, please delete "method" and insert therefor — recombinant nucleic acid 
molecule — . 

Column 50 , 

Line 1, please delete "method" and insert therefor ~ recombinant nucleic acid 
molecule -. 

Line 1, please delete "claim 27" and insert therefor — claim 26 --. 

Line 29, please delete "vitro." and insert therefor — vitro: 

Line 41, after "sites" and before "said", please insert — and/or --. 



UNITEi! - TATES PATENT AND TRADEMAK OFFICE 

CERTIFICATE OF CORRECTION 



PATENT NO. : 5,888,732 Page 2 of 2 

DATED : March 30, 1999 

INVENTOR(S) : Hartley etal. 



It is certified that error appears in the above-identified patent and that said Letters Patent is 
hereby corrected as shown below: 



Column 52 . 

Line 30, please delete "claim 46" and insert therefor - claim 38 --. 

Line 3 1 , please delete "DNA sequence" and insert therefor - recombinant 

site --. 



Signed and Sealed this 
Seventh Day of August, 2001 

P. 

NICHOLAS P. GODICI 

Attesting Officer Acting Director of the United States Patent and Trademark Office 



EXHIBIT 5 



' Printed m U.S.A. 

Purification and Properties of Int-h, a Variant Protein Involved in 
Site-specific Recombination of Bacteriophage X* 

(Received for publication, March 13, 1984) 

Brenda J. Lange-Gustafson$ and Howard A. Nash§ 

From the Laboratory of Neurochemistry, National Institute of Mental Health, Bethesda, Maryland 20205 



Under physiological conditions, integration of X DNA 
into the Escherichia coli chromosome requires the di- 
rect participation of only two proteins, the viral int 
gene product and E. coli integration host factor (IHF). 
A variant of the int gene has been isolated that permits 
integrative recombination in cells mutant for one of 
the two subunits of IHF (Miller, H. I., Mozola, M. A., 
and Friedman, D. I. (1980) Cell 20, 721-729). In the 
present work, we have purified Int-h, the product of 
this variant gene. In contrast to the wild-type int gene 
product (Int + ), which produces almost no recombinants 
in the absence of IHF, purified Int-h protein sponsors 
reduced but significant levels of integrative recombi- 
nation in the absence of any E. coli supplement. This 
shows that the int gene encodes all the information 
necessary for the elementary steps in recombination 
and implies that IHF functions as an accessory protein. 

When supplemented by IHF, recombination pro- 
moted by Int-h resembles that promoted by Int'*' in 
kinetics, stoichiometry of Int and IHF, and nature of 
the recombinant product. Under these conditions, Int- 
h uses supercoiled DNA more effectively than nonsu- 
percoiled DNA as a substrate for recombination, as 
does Int*. However, in the absence of IHF, Int-h recom- 
bines supercoiled and nonsupercoiled substrates iden- 
tically, indicating that IHF is an important part of the 
mechanism that senses the supercoiled state of the sub- 
strate DNA during recombination. A surprising differ- 
ence in recombination carried out by Int-h in the pres- 
ence or absence of IHF concerns the degree to which 
sites on the same circle recombine with one another as 
opposed to sites on sister molelcules. In the presence of 
IHF, Int-h favors intramolecular recombination, as 
does Int'''. However, in the absence of IHF, Int-h almost 
exclusively promotes intermolecular recombination. 



Bacteriophage X has a specialized system for the integration 
of viral DNA into the bacterial chromosome. This system 
carries out a reciprocal recombination between a specific viral 
site, atiP, and a specific host site, attB. The sequences and 
functional extents of these sites are known (for a recent 
review, see Ref. 1). A combination of genetic and biochemical 
experiments has shown that integrative recombination is car- 
ried out by two proteins: Int, the product of the viral int gene 
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and IHF,' a protein that is composed of two polypeptides, the 
products of the E. coli himA and hip genes (1). Several studies 
have led to the conclusion that Int is the protein that carries 
out the breakage and rejoining steps in recombination. First, 
Int binds to the core region of attP and attB, the 15-base pair 
region of homology wherein the recombination crossover oc- 
curs (2). Moreover, Int has a topoisomerase activity that can 
relax supercoiled DNA (3) and can break attachment site 
DNA, albeit at low frequency, precisely at the nucleotides 
within the core that are involved in the crossover (4). Finally, 
Int can promote the exchange of a pair of strands in DNA 
assemblies that have been constructed to resemble recombi- 
nation intermediates (5). Although these findings suggest that 
the role of Int is simply to promote strand exchange, other 
data suggest that it has additional roles. Chemical modifica- 
tion of Int can destroy recombination activity while leaving 
binding to the core and relaxing activity unchanged (4, 6). In 
addition, Int binds to portions of attP that are exterior to the 
core; this binding to the so-called arms of attP appears to be 
essential for recombination activity (2, 7). Analysis of the 
sequences protected by Int in the core and arm regions of atiP 
indicates that Int is a bifunctional protein that recognizes two 
distinct binding sequences (6, 8). 

The study of mutant proteins may be useful in dissecting 
the various ways in which Int protein promotes integrative 
recombination. In this paper, we begin the analysis of one 
variant, Int-h. This variant was isolated after selection for X 
bacteriophage that could undergo site-specific recombination 
in an E. coli host that was mutant for IHF (9). The mutation 
proved to map in the int gene and in vivo studies indicated 
that the int-h allele produced a protein with an enhanced 
recombination potential. For example, in a strain deleted for 
attB, int-h was superior to int* in promoting the integration 
of X into secondary bacterial sites. In addition, int-h showed 
altered recombination potential for excision, the removal of 
integrated viral DNA (9). In vivo studies have not revealed 
the basis of the enhanced recombination efficiency of the int- 
h allele. It might be that the Int-h protein is altered in its 
capacity to interact with IHF, its affinity for core or arm 
binding sequences, its tendency to form nucleosome-like 
structures at attachment sites (10, 11), its intrinsic topoisom- 
erase activity, etc. Since variation in any of these activities 
could provide a valuable probe for the analysis of the detailed 
mechanism of recombination, we have undertaken the study 
of the Int-h protein. This report presents our data on the 
cloning and purification of Int-h and our initial results on the 
characterization of the recombination capacity of this protein. 



'The abbreviations used are: IHF, integration host factor; SDS, 
sodium dodecyl sulfate; TEMED, N.iVJV'.N'-tetramethylethylene- 
diamine; kb, kilobase; bp, base pair. 
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EXPERIMENTAL PROCEDURES 



Bacteria and Bacteriophage— The bacteria used in this work were 
derivatives of the E. coli K12 strain N99. Strain HN356 (constructed 
by R. A. Weisberg, National Institutes of Health, Bethesda, MD) 
contains the recB21 mutation. Strains K5185 and K572 (constructed 
by H. Miller, Genentech, Inc., San Francisco, CA) respectively con- 
tain a partial deletion, himAS2, and a point mutation, /jimA42, in the 
gene for the a subunit of IHF. Strain K5248 (constructed by H. 
Miller) contains a point mutation, /wpl57, in the gene for the 0 
subunit of IHF. Strain JD12 (constructed by K. Abremski, E. I. 
duPont deNemours & Co., Wilmington, DE) contains both the 
himA43 and Ajpl57 mutations. 

Bacteriophage strain Y619 (constructed by H. Miller) is X h int-h 
in«C226 cl857; it was grown in strain K5185 and individual isolates 
were tested for the absence of inr-promoted deletions by scoring 
sensitivity to EDTA (12). Bacteriophage strain G903 (constructed by 
S. Adhya, National Institutes of Health, Bethesda, MD) is X attB- 
attP irU2 xisl redlU imm434 ell cII28 cIIKll. 

Plasmids— Plasmid pRSF2124 (13) is a derivative of colEl that 
contains the TnA transposon; it was obtained from L. Enquist, E. I. 
duPont deNemours & Co., Wilmington, DE. Plasmid pC22642 (14) 
is pRSF2124 containing a EcoRI insert from X int C226; pHN16 (this 
work) is the identical construct except that the EcoRI insert is from 
Y619. Recombination substrates were grown, labeled with [ 3 H]thy- 
midine and purified as described (15). Plasmid pPAl, pBB105, and 
pBPl are described in Ref. 15; plasmid pBP86 and pBP90 are de- 
scribed in Ref. 16. A detailed restriction map of the attP insert that 
is common to pPAl, pBP86, and pBP90 can be found in Ref. 11. 

Proteins— Wild-type Int protein (Int*) was purified as described 
(17) from strain HN695, a derivative of strain K5185 containing the 
plasmid pC22642. Int-h protein was purified from strain HN700, a 
derivative of strain K5185 containing the plasmid pHN16. Wild-type 
IHF was purified through Fraction V as described (15) from -strain 
HN356. Crude extracts of wild-type and mutant IHF were made by 
sonication of strains N99, K5185, K572, K5248, and JD12 as de- 
scribed (17). Restriction endonucleasei 
thesda Research Laboratories and New 
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Recombination Assays 

;s (27 id) contained 37 mm Tris-HCl 



(pH 7.4), 5.6 mil spermidine, 1.1 mM EDTA, 1.1 mg/ml bovine i 
albumin, 25 to 75 mM KC1, purified IHF, and purified Int as indicated. 
In some reactions, purified IHF was replaced by 0.2 id of sonic extract. 
The reaction mixtures also contained either 0.30 itg of a plasmid 
containing both attP and attB or 0.6 fig of an equimolar mixture of 
two plasmids, one containing attP and the other containing attB. 
Unless otherwise noted, the reactions were incubated for 1 h at 25 "C 
and then stopped as described for the intramolecular recombination 
assay in Ref. 15. Restriction of the recombined DNA was carried out 
with 10-50 units of endonuclease for 1 h at 37 °C. The samples were 
prepared for gel electrophoresis by addition of 5 id of a solution 
containing 25% (w/v) Ficoll, 2% (w/v) SDS, and 0.1% (w/v) brom- 
phenol blue and extracted once with about 100 id of a 24:1 (v/v) 
mixture of chloroform and isoamyl alcohoL Agarose gel electropho- 
resis was carried out as described (15). To quantitate recombination, 
bands were visualized by ethidium bromide fluorescence, cut from the 
gel, solubilized at 90 °C with 1 ml of 5 M sodium perchlorate, and 



nitrocellulose paper by the method of Southern and hybridized with 
^-labeled DNA as described (11). Acrylamide gel electrophoresis 
~" \ bands were carried out as 



Other Methods 
Topoisomerase Activity — Relaxation assays (21 id) < 
mM Tris-HCI (pH 7.5), 67 mM KC1, 5.25 mM EDTA, 3.0 mg/ml 
bovine serum albumin, 1 fig of pPAl plasmid DNA, and 1 unit of 
Fraction III Int as indicated. The reaction mixtures were incubated 
at 25 °C, stopped by addition of 2 id of 10% (w/v) SDS, and diluted 
to 6.16 ml with a solution of 50 mM Tris-HCl (pH 8.0) containing 25 
mM EDTA. To this was added 6.05 g of cesium chloride and 0.35 ml 
of ethidium bromide (10 mg/ml) and 2.5 id of "C-labeled supercoiled 
pPAl plasmid DNA. The mixture was centrifuged at 15 "C in a 
Beckman type 65 rotor for 60 h at 35,000 rpm. The gradient was 



fractionated from the bottom and counted. To assess the extent of 
relaxation, the separation between the peak of the "C-labeled marker 
DNA and the peak of the treated DNA was divided by the separation 
between the marker DNA and a sample of pPAl plasmid DNA that 
had been completely relaxed by treatment with HeLa cell topoisom- 
erase I (a gift of Dr. L. Liu, Johns Hopkins University, Baltimore, 
MD). 

SDS Gel Electrophoresis— The gels were slabs 16 cm x 17 cm x 
1.5 mm. The separating gel contained 18% (w/v) acrylamide, 0.5% 
bisacrylamide, 1 M urea, 375 mM Tris-HCl (pH 8.8), 2 mM EDTA, 
0.1% SDS, 0.1% ammonium persulfate, and 0.66% (v/v) TEMED. 
The stacking gel contained 5% acrylamide, 0.13% bisacrylamide, 1 M 
urea, 125 mM Tris-HCl (pH 6.8), 2 mM EDTA, 0.1% SDS, 0.1% 
ammonium persulfate, and 0.08% TEMED. The running buffer was 
0.19 M glycine, 1 M urea, 0.025 M Tris base, and 0.1% SDS. The 
samples were precipitated with trichloroacetic acid and washed with 
acetone as described (15). They were resuspended in 28 id of 120 mM 
Tris-HCl (pH 6.8), 2.4% SDS, 4.8 M EDTA, 24% (v/v) glycerol, 
0.007% bromphenol blue, and 2.25% /3-mercaptoethanol. The samples 
were then heated to 90 °C for 2 min and electrophoresed at 30 mA 
for 4 h. The gel was stained with 0.05% Coomassie Brilliant Blue in 
a solution of 50% methanol and 7.5% glacial acetic acid for 2 h and 
then destained in 5% methanol containing 7.5% glacial acetic acid. 



Enzyme Purification — In order to provide a rich source of 
Int-h protein, we cloned the int-h gene in a multicopy plasmid. 
As before (18), we employed a X variant, mtC226, in which 
the int gene is expressed constitutively from an altered phage 
promoter (19). The structure of the hybrid plasmid is shown 
in Fig. 1. Although cells containing this plasmid grow well, 
subcultures of the original isolate occasionally contain smaller 
plasmids. These probably represent deletions of the plasmid 
shown in Fig. 1 that are created by Int-h promoted recombi- 
nation between attP and sequences on the plasmid that resem- 
ble attB. To minimize the formation of deletions, we trans- 
ferred the plasmid from its original host, a wild-type E. coli, 
to K5185, a strain carrying a deletion in the himk gene. Site- 
specific recombination is substantially reduced in this strain 
(see below) and, accordingly, we find the hybrid plasmid is 
considerably more stable. For the experiments reported in 
this paper, Int-h was purified from K5185 containing the 
hybrid plasmid. 

To assay Int-h activity, we measured integrative recombi- 
nation in vitro by EcoRI restriction of pBP86, a plasmid 
substrate that contains both attP and attB. Two kinds of 
reaction mixtures were used In the standard assay, a source 
of Int-h was supplemented with a crude extract from wild- 
type E. coli. Using this assay, the purification of Int-h activity 
proceeded exactly as described for wild-type Int protein (17, 
18). The yield and specific activity of the highly purified Int- 
h (Table I) are not substantially different from those found 
after purification of Int* protein (17). In a second assay, a 
source of Int-h was supplemented with a crude extract from 
E. coli carrying a point mutation in the himA gene. This assay 
measures a specific quality of the Int-h protein, Le. the capac- 



roduceslnt- 



the int-h allele; only the X insert and 
are shown. The position of the int gene, 
the phage attachment site attP, and target sites for EcoRI (£) and 
Smal (S) endonucleases are shown. The left-pointing arrow indicates 
the transcript expected to govern expression of int gene; it begins at 
the start point of the P M promoter (made constitutive by the int- 
C226 mutation) and ends at the t^, terminator (31, 32). W, vector 
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Table I 
Purification of Int-h protein 
Volume Protein 



I. Crude extract* 
II. Differential salt 
precipitation 

III. Phosphocellulose 

IV. Calcium-phosphate 

cellulose 



n amount of Int that produces maximal recombina- 
Dn (17). 

6 The yield from 75 liters of culture grown to midlog phase. 
' — , not determined. 



Table II 

Recombination in vitro and in vivo with Int-h and Int* 
In vitro recombination with a plasmid substrate pBP86 was carried 
out with purified Int-h or Int* supplemented by sonicates of the 
indicated E. coli strains. Reaction conditions, EcoRI restriction, and 
Southern blotting analysis were carried out as described under "Ex- 
perimental Procedures." Recombination was quantitated by compar- 
ison of the intensity of the 8.1-kb recombinant bands (see Fig. 4 for 
a detailed restriction map); serial dilutions of each reaction mixture 
were analyzed to facilitate comparison. In vivo recombination was 
determined as the fraction of recombinant progeny following infection 
of cells containing either pHN16 or pC22642 with X attB-attP, strain 
G903. The protocol for growth and analysis of this phage is described 

in Ref. 12. 

Relative recombination' 



Wild type 
MmA42 
/t£mA82 
None 



0.10 



1 (50%) 
0.004 
<0.001 
0.002 



1 (60%) 

0.057 

0.003" 



NA C NA 



° The recombination observed with wild-type Int and IHF is as- 
signed a value of 1.0; the actual conversion of substrate to recombi- 
nant under these conditions is given in parentheses. 

"This value is at least 10-fold higher than that observed for a 
control infection that lacked a source of Int. 

' NA, not applicable. 



abed 
Flfl. 2. SDS gel electrophoresis of purified Int-h. Lane a, 1.5 
ng of Int* protein. Lane b, 1.5 ng of purified (Fraction IV) Int-h 
protein. Lane c, ovalbumin (subunit M, ~ 43,000). Lane d, integration 
host factor (subunit M, ~ 11,500 and 10,000). Sample preparation 
and electrophoresis were carried out as described under "Experimen- 
tal Procedures." 

ity to carry out recombination in the presence of mutant IHF. 
Wild-type Int protein shows almost no activity in this assay, 
whereas crude extracts containing Int-h produce readily de- 
tectable levels of recombination. During the purification of 
Int-h, the ratio of activities in the two assays remained 
constant (data not shown). Unless otherwise stated* aH results 
reported in this paper are from experiments that use Int-h or 
Int* purified through Step IV of Table I and Ref. 17. 

The purified protein is quite stable. We routinely add 
bovine serum albumin (2 mg/ml) to our purified proteins; 
under these conditions, Int-h activity is stable for at least 1 
year at —70 "C. As found for purified wild-type Int protein, 
fractions containing Int-h protein without added bovine 
serum albumin show diminished activity after repeated freez- 
ing and thawing. The purified Int-h protein is nearly homo- 
geneous. As shown in Fig. 2, SDS gel electrophoresis of the 
purified material shows a prominent major band that co- 
migrates with purified Int* protein at an apparent M t ~ 
40,000. On careful examination, some minor bands are evi- 
dent; these are similar in molecular weight and intensity to 
those seen in preparations of wild-type Int purified from a 
K5185 derivative. The identity in size as well as the similarity 
in purification of Int-h and Int* indicate that the int-h mu- 



tation does not radically alter the Int polypeptide. This hy- 
pothesis is supported by the similar sensitivity of the two 
purified proteins to inactivation. Both are readily inactivated 
either by incubation at 45 °C or by exposure to Af-ethylmal- 
eimide (data not shown). 

Recombination Promoted by Int-h— Table II compares the 
efficiency of recombination promoted by Int-h or Int* in the 
presence of different sources of crude IHF. As noted above, 
Int-h promotes efficient recombination when supplemented 
either with an extract of cells carrying the point mutation 
himA42 or supplemented with an extract of wild-type cells. 
By contrast, Int* yields very little recombination with the 
mutant extract (line 1 versus 2). Int-h cannot utilize all 
mutant extracts equally well. Extracts from cells carrying a 
deletion mutation, himk82, assist Int-h promoted recombi- 
nation less than one-third as well as do extracts from himk42 
(Table II, line 2 versus 3). In addition, extracts from cells 
carrying the point mutation hiplbl or the double mutation 
himM2 hiplbl are similar to extracts from himkm cells in 
their capacity to assist Int-h promoted recombination (data 
not shown). Compared to these extracts, the enhanced recom- 
bination seen when Int-h is supplemented with extracts from 
himM2 cells suggests that the himk42 mutation has not 
completely inactivated the a subunit of IHF and that Int-h is 
better able than Int* to utilize the residual activity. The 
relative capacities of Int-h and Int* to promote recombination 
with various sources of IHF are not changed by altering the 
amount of Int protein, the amount of crude IHF, or the time 
of incubation (data not shown). 

The recombination observed with extracts from cells con- 
taining the himk&2 mutation, a deletion of the himk gene, 
shows that Int-h can promote recombination in the complete 
absence of a functional himk gene product. This result was 
unexpected because it had been reported earlier (9) that, in 
vivo, the int-h allele could not suppress the recombination 
defect of a strain bearing a deletion of the himk gene. 2 We 



* Recent experiments show that in himk deletion strains, int-h can 
promote a low level of excision of X from a secondary bacterial site 
(R. Weisberg; personal communication; D. Friedman; personal com- 
munication). 
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think this failure reflects the limited amount of Int protein 
made under the previous conditions since a small but readily 
detected amount of integrative recombination is observed in 
vivo after infection of himA82 cells when Int-h is provided 
from our overproducing plasmid (Table II). 

The ability of Int-h to promote recombination with a wide 
variety of mutant IHF extracts suggests that this protein 
might have recombination activity in the total absence of 
IHF. This is confirmed in the last entry of Table II; this result 
indicates that, to the extent that our preparation is pure, Int- 
h can promote recombination by itself. The amount of IHF- 
independent recombination is not large, about 10% that seen 
when Int-h is supplemented with wild-type IHF. Note that 
this amount is similar to that seen when IHF is supplemented 
with crude extracts from a himk deletion or hip mutant strain. 
This means that other proteins found in E. coli, including 
either of the remaining wild-type subunits of IHF, cannot 
assist Int-h in promoting recombination. The capacity of Int- 
h to promote recombination by itself is also demonstrated in 
Fig. 3A. In the absence of IHF, Int-h (lane a) but not Int* 
(lane f) produces detectable recombinants. It should be 
pointed out that the mobility of the recombinant fragment 
produced in the absence of IHF is identical to that produced 
in its presence (cf. lanes a and b). This implies that the 
breakage and reunion caused by Int-h acting in the absence 
of IHF occurs at the same sites as observed in the standard 
recombination reaction. 

The remainder of Fig. 3A presents the response of Int* and 
Int-h to increasing amounts of IHF that has been purified 
from wild-type E. coli. Both Int proteins are stimulated by 
IHF. We estimate that the amount of IHF needed to maxi- 
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Fig. 3. Recombination promoted by varying amounts of Int 
and IHF. Supercoiled pBP86 substrate DNA was incubated in the 
presence of 50 mM KC1 for 25 min as described under "Experimental 
Procedures" with proteins as indicated. In panel A, the reactions 
contained 6.0 /ig/ml of Int-h (a-e) or Int* (f-j) and different concen- 
trations of IHF: a and /, 0.0 jig/ml; 6 and g, 0.125 itg/mk c and ft, 0.5 
/ig/ml; d and t, 1.25 jig/ml; e and/, 2.5 ^g/ml. In panel B, IHF was 
fixed at 2.5 ng/ml and either Int-h (a-e) or Int* ( f-j) was varied: a 
and f, 0.15 /ig/ml; 6 and g, 0.30 fig/ml; c and ft, 0.60 ug/xal; d and «', 
1.20 *ig; e and/, 6.00 ng/rci\. A pair of arrows indicates the position of 
the 8.1 -kb recombinant fragment. 



Fig. 4. Restriction maps for analysis of intermolecular and 
intramolecular recombination. At the top are shown two identical 
pBP86 substrate DNA circles. Attachment sites are written as POP' 
(attP) and BOB' (attB) where O represents the 15-bp core wherein 
the recombination crossover takes place. The position of £fcoRI (if) 
and P*tl (P) restriction sites are marked with arrows. Distances (in 
base pairs) between adjacent restriction and/or attachment sites are 
written inside each substrate circle. Below the substrate circles are 
given the expected fragment lengths following Psfl or ficoRI diges- 
tion. The two circular products of intramolecular recombination 
between attP and attB are drawn at the bottom left. They are shown 
separated from one another but after a typical recombination reaction 
they are linked to one another as a catenane (21). The single dimeric 
circle that arises from recombination between aft P on one circle with 
attB on the other is drawn at the lower right. Restriction sites, 
attachment sites, and fragment lengths are indicated as for the 
substrate. 

mally stimulate Int + is about 3-fold greater than that required 
to stimulate Int-h. This modest difference indicates that Int- 
h is not greatly altered in its affinity for or capacity to utilize 
IHF. Fig. 35 shows the effect of adding increasing amounts 
of Int + or Int-h to a fixed, saturating amount of IHF. Similar 
amounts of the two proteins produce similar levels of recom- 
bination. This means that the Int-h mutation has not altered 
the number of Int molelcules required to carry out recombi- 
nation. No less than 35 Int-h molecules are needed per recom- 
bination event. However, as in our earlier studies with Int* 
(15), we do not know the extent to which this stoichiometry 
reflects inactive protein in our purified preparation. 

Recombination in the Absence of IHF— Because the occur- 
rence of X integrative recombination in the absence of IHF is 
unprecedented, we have investigated this reaction in more 
detail. Optimal conditions for IHF-independent recombina- 
tion promoted by Int-h are slightly different than those ob- 
fhich Int-h or Int* are 
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supplemented by IHF. The optimal ionic strength is lower (25 
mM KC1 rather than 70 mM), the kinetics are about 2-fold 
slower (half-maximal recombination in 30 min rather than 
10-20 min), and more Int protein is required for maximal 
recombination (about 2-fold as much). However, even under 
these optimal conditions, in the absence of IHF, Int-h does 
not recombine more than 15% of the substrate DNA. 

The DNA substrate used in Table II and Fig. 3 contains a 
pair of attachment sites oriented as a direct repeat. Integrative 
recombination promoted by Int-h in the absence of IHF has 
been observed with two other kinds of substrates. One sub- 
strate, pBP90, contains attP and attB on the same circle of 
DNA with the two attachment sites oriented so their core 
sequences form an inverted repeat. The other substrate was a 
pair of circles each of which carries a single attachment site, 
either attP or attB. Both substrates exhibited approximately 
the same capacity for recombination promoted by Int-h in the 
absence of IHF (measured relative to recombination in the 
presence of IHF) as was observed for pBP86 substrate (Table 
III and data not shown). Taken together, these results show 
that the capacity of Int-h to carry out recombination in the 
absence of IHF depends neither on the number of sites per 
circle nor on their orientation (see Ref. 20 for an unusual case 
of A site-specific recombination that strongly depends on 
these factors). 

Could recombination with Int-h in the absence of IHF 
reflect the action of a second protein that contaminates our 
purified Int preparation? This putative contaminant cannot 
be normal IHF because the Int-h protein was purified from a 
strain partially deleted for the himk gene. In addition, SDS 
gel electrophoresis reveals no trace of a polypeptide with the 
mobility of the remaining subunit of IHF (Fig. 2), even when 
the gel is greatly overloaded (data not shown). Since any 
putative contaminant would have to be present in very small 
amounts relative to Int, it would have to function catalytically, 
a mode of action not observed with IHF. We attempted to 
test for such an activity in the following way. We inactivated 
Int-h protein by treatment with Af-ethylmaleimide and as- 
sayed for residual IHF or some other JV-ethylmaleimide- 
resistant component that could either activate purified Int* 
protein or stimulate purified Int-h protein; no such compo- 
nent was found. This negative result does not rule out the 




a b c 



Fig. 5. Restriction analysis of intermolecular versus intra- 
molecular recombination in the presence or absence of IHF. 

Recombination of supercoiled pBP86 substrate was carried out with 
the following concentrations of IHF and Int: lane a, 0.0 pig/ml IHF 
and 0.0 ftg/xal Int; lane b, 0.0 jig/ml IHF and 6.0 ptg/ml Int-h; lane c, 
0.83 fig/ml IHF and 6.0 <tg/ml Int-h. Reactions were carried out in 
the presence of 25 mM KCI for 60 rniii at 25 "C, treated with Pstl 
restriction endonuclease, and electrophoresed as described under 
"Experimental Procedures." The position of the substrate fragments 
is indicated at the left and the positions of the fragments diagnostic 
for intermolecular recombination (6.5 kb) and total recombination 
(3.7 kb) are indicated at the right. 



existence of a helping factor in our preparations of Int-h, but 
the simplest interpretation of our results is that Int-h protein 
has an intrinsic capacity to carry out all the steps required 
for integrative recombination. 

Although many features of IHF-independent recombina- 
tion sponsored by Int-h are similar to those of standard 
integrative recombination, there is one surprising difference. 
IHF-independent recombination preferentially recombines 
attachment sites on different molecules as opposed to sites 
that are situated on the same molecule. The reverse is true 
for integrative recombination that is promoted by either Int* 
or Int-h when supplemented by IHF. The bias favoring inter- 
molecular versus intramolecular recombination is revealed by 
restriction analysis. Fig. 4 shows restriction maps for pBP86, 
a substrate with directly repeated attachment sites; recombi- 
nation between ottP and attB leads to the appearance of three 
kinds of new bands after digestion with restriction endonu- 
clease Pstl: those that arise strictly from intramolecular re- 
combination (1.4-kb circle), those that come strictly from 
intermolecular recombination (6.5-kb linear fragment), and 
those that are produced by both pathways (3.7-kb linear 
fragment). 3 Fig. 5 shows that when recombination reactions 
carried out in the presence or absence of IHF are analyzed by 
restriction with Pstl nuclease, the two linear recombinant 
fragments are produced in different relative yields. In reac- 
tions carried out in the presence of IHF (lane c), the 6.5-kb 
fragment is much less prominent than the 3.7-kb fragment, 
indicating that intermolecular recombination plays only a 
small role and that intramolecular is the favored reaction. 
Intramolecular recombination is also the dominant pathway 
in reaction mixtures with wild-type Int and IHF (data not 
shown), supporting earlier conclusions (11, 21). In contrast, 
in reactions carried out in the absence of IHF (lane b), the 
6.5-kb fragment and 3.7-kb fragment are of similar intensity, 
indicating that intermolecular recombination is a major path- 
way. When the amount of DNA in each band is quantitated 
(Table III), it appears that essentially all of the recombinant 
product comes from the intermolecular pathway in the ab- 
sence of IHF and that less than 5% of the recombinant 
product comes from this route in the presence of IHF. This 
analysis is supported by the appearance of a substantial 
amount of 1.4-kb circular species in reactions carried out in 
the presence of IHF and the virtual absence of such species 
in reactions carried out in the absence of IHF (data not 
shown). The same kind of analysis of intermolecular versus 
intramolecular recombination has been done with two other 
substrates: pBPl, that is similar to pBP86 but has a different 
separation between attechament sites (6.8 versus 1.4 kb) and 
pBP90, that has attP and attB oriented as an inverted repeat. 
Equivalent results were obtained; in the absence of IHF, 
recombination promoted by Int-h was predominantly inter- 
molecular. 

The conclusion that, in the absence of IHF, Int-h mainly 
promotes intermolecular recombination is confirmed by an 
analysis of the unrestricted products of reaction mixtures. To 
avoid complications of supercoiling, the products of recombi- 
nation of pBP86 substrate were nicked with pancreatic DNase 
and electrophoresed on a high resolution gel system as first 
described by Sundin and Varshavsky (22). Fig. 6, lanes b and 
c, show that recombination in the presence of IHF yields a 



* Note that restriction with endonuclease £coRI yields only frag- 
ments that are of the third kind. Thus, a bias in intermolecular versus 
intramolecular recombination would not have been detected in the 
standard assay used to purify Int-h. Note also that recombination 
between attP one molelcule with attP on another does not change the 
restriction pattern and is not scored in these assays. 
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Table III 

Quantitative comparison of recombination pathways 
Recombination was carried out as described in the legend to Fig. 5 
and analyzed by treatment with endonuclease Pstl and agarose gel 
electrophoresis, as described under "Experimental Procedures." The 
substrate pBP86 and its recombinant products are described in Fig. 
4. Recombination between DNA molecules that contain only a single 
attachment site was carried out with supercoiled plasmid pPAl (con- 
taining attP) and EcoRI-linearized plasmid pBB105 (containing 
atiB). Treatment of these reaction mixtures with endonuclease Pstl 
yields substrate fragments of 4.2 and 4.9 kb (plus smaller fragments) 
and a recombinant fragment of 7.2 kb (plus smaller fragments), wt, 
wild type protein; h, Int-h protein. 
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" The cpm in the 3.7-kb fragment was divided by the cpm in the 
4.4 plus 5.1-kb fragments from an unrecombined mixture. This ratio 
was multiplied by the factor 2.57 to correct for the difference in 
fragment sizes; the result is expressed as a percentage. 

°The cpm in the 6.5, kb fragment was divided by the cpm in the 
4.4 plus 5.1 kb fragments from an unrecombined mixture. This ratio 
was multiplied by the factor 1.45 to correct for the difference in 
fragment sizes. This value, the fraction of substrate recombining via 
the intermolecular pathway, was divided by the total recombinant 
fraction to give the proportion of recombinants that use the inter- 
molecular pathway. This result is expressed as a percentage. 

' Average of four experiments. 

'The cpm in the 7.2-kb fragment was divided by the cpm in the 
4.2 plus 4.9 kb fragments from an unrecombined mixture. This ratio 
was multiplied by 1.265 to correct for the difference in fragment sizes. 
The result is expressed as a percentage. 

' For these substrates, intermolecular recombination is the only 
pathway possible. 

'Average of three experiments. 

ladder of bands. Each band in the ladder contains catenanes 
between the two circular products of intramolecular recom- 
bination (21), each step of the ladder representing catenanes 
with a different number of interlocks between the two circles 
(22). In contrast, Fig. 6d shows that in the absence of IHF, 
recombination promoted by Int-h yields mostly dimeric cir- 
cles. Isolation of the dimer band and subsequent restriction 
with endonuclease EcoRl confirmed that these circles are the 
result of intermolecular recombination between an attP on 
one substrate molecule and an attB on a second molecule 
(data not shown). 

Recombination of Nonsupercoiled DNA — The int-h muta- 
tion has been reported to increase X site-specific recombina- 
tion during infection of strains carrying a mutation in gyrB, 
the gene encoding the 0 subunit of DNA gyrase (9). This 
suggests that Int-h protein might be more active than Int* on 
DNA substrates that have reduced levels of supercoiling. 
Fig. 7 shows a comparison of recombination with supercoiled 
and nonsupercoiled substrates. The assays were carried out 
in the presence of IHF at low ionic strength, a condition that 
is optimal for Int + -promoted recombination of nonsupercoiled 
substrates. As previously reported (23, 24), even under these 
conditions supercoiled DNA is the better substrate for wild- 
type Int protein, recombining about 10 times as fast as non- 
supercoiled DNA (Fig. 7a). Int-h protein is similar to Int* in 
the speed with which it promotes recombination of supercoiled 
DNA substrates (Fig. 76). However Int-h promotes recombi- 




a b c d 



Fig. 6. Topological analysis of intermolecular versus intra- 
molecular recombination. Supercoiled plasmid pBP86 was recom- 
bined as described in the legend to Fig. 5 with the following concen- 
trations of IHF and Int: lane a, 0.0 /ig/ml IHF and 0.0 /ig/ml Int; lane 
b, 0.83 Mg/ml IHF and 4.0 /ig/ml Int*; lane c, 0.50 *ig/ml IHF and 1.5 
itg/ml Int-h; lane d, 0.0 ug/xnl IHF and 6.0 ng/ml Int-h. After 60 min 
at 25 °C, the reaction was adjusted to 100 mM KC1, 10 mM MgCl 2 
and digested for 10 min at 37 *C with sufficient pancreatic DNase to 
introduce several nicks per substrate. The samples were then electro- 
phoresed as described (22). The position of the following monomeric 
(9.4 kb) and dimeric species are indicated: 1, linear monomer; 2, 
nicked monomer circle; 3, linear dimer; 4, nicked dimer circle. The 
linear species arise from excessive digestion with pancreatic DNase. 




10 30 5O7O90 1O3O5O7OS0 



TIME (min) 

Fig. 7. Recombination of supercoiled and nonsupercoiled 
substrates. The substrate was a mixture of two DNAs, the super- 
coiled form of pBP86 and the nicked circle form of pBP86dl, a 
deletion derivative of pBP86 (constructed in this laboratory by T. 
Pollock) that has lost 600 base pairs from the 1.4-kb segment that 
separates attP and attB. An aliquot of this substrate (0.5 ng) was 
incubated as described in the legend to Fig. 5 with 0.5 /ig/ml IHF and 
either 4.0 jig/ml Int* {panel a) or 3.0 ug/m\ Int-h (panel b). After 
various times, aliquots were removed and stopped by heating to 65 °C, 
digested with restriction endonuclease BamHI, and electrophoresed 
through polyacrylamide. The radioactivity in a fragment diagnostic 
of recombination of the supercoiled substrate (1.4 kb) or nonsuper- 
coiled substrate (0.8 kb) was determined as described under "Exper- 
imental Procedures.'' Each value was divided by the radioactivity in 
a Pstl/BamHl fragment characteristic of the appropriate substrate 
and the ratio was adjusted to yield the extent of recombination as in 
Table III. The average of four and six experiments is plotted in a and 
b, respectively. , supercoiled substrate; , nonsupercoiled sub- 
strate. 

nation of nonsupercoiled DNA with an initial velocity about 
3 times faster than that observed with Int*. We conclude from 
these experiments that, when supplemented by IHF, Int-h is 
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superior to Int* in promoting recombination of nonsuper- 
coiled DNA. 

The results just presented show that supercoiling imparts 
a modest benefit to recombination promoted by Int-h. We 
next asked whether this benefit requires the presence of IHF. 
Supercoiled and nonsupercoiled DNA were compared as sub- 
strates for Int-h promoted recombination in the absence of 
IHF. Inspecting gels like those used to generate Fig. 7 indi- 
cated that the efficiency and kinetics of recombination were 
similar for both substrates (data not shown). This impression 
was confirmed by an experiment in which the extent of 
recombination was quantitated. After 20 min of incubation, 
supercoiled and nonsupercoiled substrates both had under- 
gone 3.5% recombination; after 90 min, the yield of recombi- 
nants from both substrates was 14.0%. It appears that Int-h 
protein, by itself, does not distinguish between supercoiled 
and nonsupercoiled DNA substrates. We have considered one 
trivial explanation for this lack of discrimination. It has been 
shown earlier that Int protein contains a topoisomerase activ- 
ity that relaxes supercoiled DNA (4). This activity is intrins- 
ically weak relative to the recombination activity of Int and 
is partially inhibited by IHF (25). If Int-h protein displayed a 
much stronger topoisomerase activity, supercoiled DNA 
might be relaxed in reaction mixtures before recombination 
could occur. However, this hypothesis is not supported by a 
comparison of the relaxing activities of Int* and Int-h proteins 
(Fig. 8). Moreover, no major difference is seen in the relaxa- 
tion of the substrate DNA when similar amounts of Int* and 
Int-h are used in recombination reactions (data not shown). 
We conclude that the failure of Int-h to recombine supercoiled 
and nonsupercoiled substrates with different efficiency means 
that IHF is an essential part of the mechanism that senses 
the superhelicity of the recombination substrate. 

DISCUSSION 

Purified Int-h protein carries out substantial amounts of 
integrative recombination in the apparent absence of IHF. 
Thus, one must conclude that Int protein can manifest all the 
activities required for the elementary steps of recombination 
between specific sites. As mentioned in the Introduction, 
earlier observations indicated that wild-type Int carries the 
catalytic center responsible for breakage and reunion. Our 
studies with Int-h confirm these conclusions and demonstrate, 
for the first time, that this protein can also specify the steps 
required for synapsis. Unless the int-h allele creates new 
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Fig. 8. Relaxing activity of Int* and Int-h. Radioactively la- 
beled supercoiled plasmid pPAl (DNA) was incubated as described 
under "Experimental Procedures" with identical amounts of Int* 

( ) or Int-h ( ). The reactions were analyzed by cesium chlo- 

ride-ethidium bromide centrifugation as described under "Experimen- 
tal Procedures." The degree of relaxation relative to a fully relaxed 
plasmid is plotted as a function of the length of the incubation. 



properties rather than enhancing properties inherent in the 
parent gene, we must conclude that wild-type Int also has the 
capacity to carry out synapsis as well as recognition and 
strand exchange. Indeed, when sensitive methods are used to 
probe for recombinant products, wild-type Int does reveal 
some ability to carry out recombination in the absence of 
IHF, albeit at a level 50-fold lower than that observed with 
Int-h acting alone and 500-fold lower than observed in the 
presence of IHF (Table II). By implication, IHF must play an 
accessory role in recombination, enhancing the capacity of 
Int to carry out one or more of the critical steps in recombi- 
nation. 

What mechanism underlies the enhanced capacity of Int-h 
to carry out recombination in strains mutant for IHF? Our 
studies have ruled out three kinds of explanation. First, we 
have shown that the int-h allele does not simply lead to the 
overproduction of an essentially wild-type protein. The 
amount of Int protein, we recover from himk cells expressing 
Int* or Int-h alleles is virtually identical, confirming and 
extending studies on the rate of synthesis of Int* and Int-h 
polypeptides (9). A second possible explanation for the en- 
hanced activity of the int-h allele is that it specifies an Int 
protein with an enhanced capacity for enzymatic turnover. 
Previous studies had demonstrated that many molecules of 
purified Int* protein are required to produce a single recom- 
binant (15, 26, 27). This implies that Int promotes recombi- 
nation by a stoichiometric rather than catalytic action on 
attachment sites. If a molecule of Int-h protein were able to 
participate in more than one recombination event, an intrins- 
ically weak capacity to carry out recombination in the absence 
of IHF would be magnified. However, the same value for the 
amount of protein required to produce a recombinant is 
observed with our purified preparation of Int-h protein as is 
seen with purified Int*. Therefore, an increased capacity for 
turnover is not the basis for the Int-h phenotype. A third 
hypothesis for the behavior of Int-h invokes preferential 
interaction between Int-h and IHF, permitting Int-h to utilize 
IHF whose quality or quantity is altered by mutation. This 
hypothesis, which was our favorite at the outset of this work, 
is ruled out as the sole explanation for the behavior of Int-h 
by our observation that Int-h carries out recombination in 
the complete absence of IHF. Titration curves show about a 
3-fold difference in the levels of IHF required to stimulate 
recombination by Int-h and Int*. This small effect and the 
enhanced recombination in /umA42 (as opposed to other IHF 
mutants) indicate that Int-h may have a preferred interaction 
with IHF but this putative alteration cannot be a major factor 
in explaining the Int-h phenotype. 

Many possibilities remain as plausible explanations for the 
behavior of Int-h protein. These can be organized around the 
concept that recombination can be divided into steps of rec- 
ognition, synapsis, and strand exchange. Alteration in recog- 
nition of attachment sites by Int-h is a very straightforward 
hypothesis. For example, Int-h might have a higher affinity 
for the core region of ortP than does Int*. Footprinting studies 
have shown that IHF binds to segments of attV that flank the 
region of the core that is occupied by Int (reviewed in Ref. 1). 
Moreover, similar studies have revealed that IHF enhances 
the binding of Int* to the core region.'' Thus, it is attractive 
to imagine that Int-h can function in the absence of IHF 
because it binds more tightly to the core than does Int*. This 
argument can easily be extended to explain the enhanced 
capacity of Int-h to promote integration at secondary bacterial 
attachment sites (9). In addition to models invoking altered 



* N. Craig and H. Nash, manuscript in preparation. 
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recognition of attachment sites, equally attractive hypotheses 
can be constructed concerning alterations in the capacity of 
Int-h to carry out the later steps of recombination. For this 
purpose, it is best to consider a formal scheme for synapsis 
and strand exchange embodied by the equation: 

attP* + attB* ^ (attP*-attB*) -> attL + attR 

In this scheme, attP* and attB* represent attachment sites 
loaded with recombination proteins, (attP*-attB*) represents 
the synaptic intermediate in which the two sites are juxta- 
posed with the 15-base pair cores aligned, and attL + attR 
represent the products of strand exchange. Since synapsis is 
postulated to be a reversible process, the efficiency of recom- 
bination will be enhanced by changes that either stabilize the 
synaptic intermediate or accelerate its conversion to a recom- 
binant product. If the alteration in Int-h changes either of 
these characteristics, it would be easy to understand how this 
protein might be better able to utilize what little synaptic 
intermediate might form under restrictive conditions such as 
the absence of IHF. Although our experiments have not yet 
defined which of these mechanisms is responsible for the 
altered recombination activity of Int-h, we feel the present 
work opens the way to a rational investigation of this problem. 

Our most surprising finding is that, when acting alone, Int- 
h preferentially recombines two attachment sites that are 
located on separate circles rather than two sites that are 
situated on the same circle. That is to say, in the absence of 
IHF, Int-h promotes intermolecular rather than intramolec- 
ular recombination. Precisely the opposite is true when Int-h 
carries out recombination in the presence of IHF. This same 
bias, intramolecular recombination over intermolecular re- 
combination, has been observed both in vivo (12) and in vitro 
(21) for wild-type Int protein in the presence of IHF. A 
preference for intramolecular recombination is readily under- 
standable since the effective concentration of attachment site 
pairs should be high when the sites are tethered to each other 
by a length of flexible DNA (28, 29). Thus, the preference for 
intermolecular recombination shown by Int-h in the absence 
of IHF is surprising. This bias has been observed with two 
different substrates: one containing directly repeated attach- 
ment site and one containing an inversely repeated pair of 
sites. We think, therefore, that the phenomenon is intrinsic 
to IHF-independent recombination and we have considered 
two kinds of explanation for our observations. The first model 
states that, in the absence of IHF, intramolecular recombi- 
nation is suppressed while intermolecular recombination pro- 
ceeds at its normal rate. This could come about as a result of 
Int-h binding to substrate DNA. It is known that, in addition 
to specific binding to attachment sites, Int protein, has a 
substantial nonspecific affinity for DNA. Footprinting (6) 
and electron microscopic (10) studies have shown that, at 
high ratios of Int to DNA, long stretches of DNA become 
covered with protein. If this were to happen to a recombina- 
tion substrate, the DNA between attachment sites might be 
made sufficiently stiff so that the capacity of two sites on the 
same circle to becomed juxtaposed would decrease. A second 
class of models to explain the bias toward intermolecular 
recombination has been suggested by the observation that Int 
can aggregate DNA. 6 If the local concentration of DNA mol- 
ecules is raised sufficiently by aggregation, statistical theory 
implies that random intermolecular collisions between attach- 
ment sites will predominate over intramolecular events (28, 
29). An interesting example of this kind of phenomenon has 
been recently observed for the joining of cohesive ends by 



DNA ligase in the presence or absence of volume excluders 
like polyethylene glycol. In the absence of polyethylene glycol, 
intramolecular ligation to form circles is the predominant 
mode but when the effective concentration of DNA is raised 
by the addition of polyethylene glycol, circles are not formed 
and linear multimers accumulate (30). We imagine that, in 
the absence of IHF, Int-h may promote aggregation of sub- 
strate DNA either because of nonspecific charge-shielding 
effects or because of interactions between Int proteins bound 
to the DNA. Regardless of which proposed mechanism is 
responsible for the preference for intermolecular recombina- 
tion, it should be emphasized that IHF reverses this bias. 
This means that IHF not only stimulates recombination 
promoted by Int-h and Int + but also changes the capacity of 
Int to bind nonspecifically to DNA and/or to form DNA 
aggregates. In this context, it is interesting to note that IHF 
prevents the formation of DNA aggregates by Int. 5 

IHF may not be the only E. coli protein that can reverse 
the intermolecular recombination bias of Int-h. Substantial 
amounts of intramolecular recombination are observed when 
Int-h is supplemented with crude extracts from cells carrying 
a deletion in the himk gene. 6 It may be that many basic 
proteins will share with IHF the capacity to interfere with 
non-specific binding to DNA or aggregation of DNA by Int- 
h. We are led to speculate that the action of many DNA 
binding proteins that, like Int, have a significant nonspecific 
binding affinity and a tendency to aggregate is modulated by 
the kind of interaction with accessory proteins that we have 
uncovered in this work. 
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2. I have reviewed the above-captioned application, a copy of the pending claims, 
and the Office Action mailed on May 4, 2006, as well as the literature cited therein. I understand 
that the examiner in charge of the present application asserts that one of ordinary skill in the art 
would find the present invention "obvious" over the citations of Hartley and Christ & Droge, and 
Crouzet and Christ & Droge, optionally in view of Capecchi. For the following reasons, I 
respectfully disagree. 

3. The Int-h/218 mutant was originally generated with an aim to design a 
recombinase which exhibits an increased binding affinity for so-called core binding sites. The 
latter are present in all att sites (the recombination substrates). This enzyme (and the parental 
mutant Int-h) has never been studied in detail in vitro, i.e., purified and analyzed with DNA 
substrates in the test tube. It was, therefore, not clear whether the enzyme would be active in the 
absence of protein co-factors and negative DNA supercoiling of substrate DNA. However, both 
factors are present in E. coli. Before we transferred this mutant to mammalian cells, we knew 
that the enzyme could catalyze an abnormal reaction in E. coli in the absence of the co-factor 
IHF (Christ and Droge, 1999). However, one has to realize that DNA substrates (whether 
episomal or genomic) are negatively supercoiled inside E. coli. It was, therefore, not obvious to 
one of ordinary skill to deduce from the existing data that the mutant recombinase would work 
inside mammalian cells where the DNA is topologically relaxed. In fact, up to this day, the 
reason why both Int-h and the double mutant Int-h/218 are functional in eukaryotic cells remains 
a mystery. One possibility is that there is an unidentified mammalian co-factor which supports 
the prokaryotic recombinase. Based on these facts, a claim that the invention is "obvious" 
reflects a thorough misunderstanding of the topic. 
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METHODS AND COMPOSITIONS FOR 
GENOMIC MODIFICATION 



This application is related to U.S. Provisional Patent 
Application Serial No. 60/097,166, filed Aug. 19, 1998, 
from which priority is claimed under 35 USC §119(eXl), 
and which application is incorporated herein by reference in 
its entirety. 10 

This invention was made with support under NIH Grant 
R01 DK51834 from the National Institutes of Health, U.S. 
Department of Health and Human Services. Accordingly, the 
United States Government may have certain rights in the 
invention. 

FIELD OF THE INVENTION 

The present invention relates to the field of biotechnology, 
and more specifically to the field of genomic modification. 2Q 
Disclosed herein are compositions, vectors, and methods of 
use thereof, for the generation of transgenic cells, tissues, 
plants, and animals. The compositions, vectors, and methods 
of the present invention are also useful in gene therapy 
techniques. 25 

BACKGROUND OF THE INVENTION 
Permanent genomic modification has been a long sought 
after goal since the discovery that many human disorders are 
the result of genetic mutations that could, in theory, be 30 
corrected by providing the patient with a non-mutated gene. 
Permanent alterations of the genomes of cells and tissues 
would also be valuable for research applications, commer- 
cial products, protein production, and medical applications. 
Furthermore, genomic modification in the form of trans- 35 
genie animals and plants has become an important approach 
for the analysis of gene function, the development of disease 
models, and the design of economically important animals 
and crops. 

A major problem with many genomic modification meth- 40 
ods associated with gene therapy is their lack of perma- 
nence. Life-long expression of the introduced gene is 
required for correction of genetic diseases. Indeed, sustained 
gene expression is required in most applications, yet current 
methods often rely on vectors that provide only a limited 45 
duration of gene expression. For example, gene expression 
is often curtailed by shut-off of integrated retroviruses, 
destruction of adenovims-infected cells by the immune 
system, and degradation of introduced plasmid DNA 
(Anderson, W F, Nature 329:25-30, 1998; Kay, et al, Proc. so 
Natl. Acad. Sci. USA 94:12744-12746, 1997; Verma and 
Somia, Nature 389:239-242, 1997). Even in shorter-term 
applications, such as therapy designed to kill tumor cells or 
discourage regrowth of endothelial tissue after restenosis 
surgery, the short lifetime of gene expression of current 55 
methods often limits the usefulness of the technique. 

One method for creating permanent genomic modification 
is to employ a strategy whereby the introduced DNA 
becomes part of (i.e., integrated into) the existing chromo- 
somes. Of existing methods, only retroviruses provide for 60 
efficient integration. Retroviral integration is random, 
however, thus the added gene sequences can integrate in the 
middle of another gene, or into a region in which the added 
gene sequence is inactive. In addition, a different insertion is 
created in each target cell. This situation creates safety 65 
concerns and produces an undesirable loss of control over 
the procedure. 



Adeno-associated virus (AAV) often integrates at a spe- 
cific region in the human genome. However, vectors derived 
from AAV do not integrate site-specifically due to deletion 
of the toxic rep gene (Flotte and Carter, Gene Therapy 
2:357-362, 1995; Muzyczk, Curr. Topics Microbiol. Immu- 
nol. 158:97-129, 1992). The small percentage of the AAV 
vector population that eventually integrates does so ran- 
domly. Other methods for genomic modification include 
transfection of DNA using calcium phosphate 
co-precipitation, electroporation, lipofection, 
microinjection, protoplast fusion, particle bombardment, or 
the Ti plasmid (for plants). All of these methods produce 
random integration at low frequency. Homologous recom- 
bination produces site-specific integration, but the frequency 
of such integration is very low. 

Another method that has been considered for the integra- 
tion of heterologous nucleic acid fragments into a chromo- 
some is the use of a site-specific recombinase (an example 
using Cre is described below). Site-specific recombinases 
catalyze the insertion or excision of nucleic acid fragments. 
These enzymes recognize relatively short, unique nucleic 
acid sequences that serve for both recognition and recom- 
bination. Examples include Cre (Sternberg and Hamilton, J 
Mol Biol 150:467^186, 1981), Flp (Broach, et al, cell- 
29:227-234, 1982) and R (Matsuzaki, et al, J Bacteriology 
172:610-618, 1990). 

One of the most widely studied site-specific recombinases 
is the enzyme Cre from the bacteriophage PI. Cre recom- 
bines DNA at a 34 basepair sequence called loxP, which 
consists of two thirteen basepair palindromic sequences 
flanking an eight basepair core sequence. Cre can direct 
site-specific integration of a loxP-containing targeting vec- 
tor to a chromosomally placed loxP target in both yeast and 
mammalian cells (Sauer and Henderson, New Biol 
2:441-449, 1990). Use of this strategy for genomic 
modification, however, requires that a chromosome first be 
modified to contain a loxP site (because this sequence is not 
known to occur naturally in any organism but PI 
bacteriophage), a procedure which suffers from low fre- 
quency and unpredictability as discussed above. 
Furthermore, the net integration frequency is low due to the 
competing excision reaction also mediated by Cre. Similar 
concerns arise in the conventional use of other, well-known, 
site-specific recombinases. 

A need still exists, therefore, for a convenient means by 
which chromosomes can be permanently modified in a 
inner. The present invention addresses that 



BRIEF DESCRIPTION OF THE INVENTION 

Accordingly, in one embodiment, the present invention is 
directed to a method of site-specifically integrating a poly- 
nucleotide sequence of interest in a genome of a eucaryotic 
cell. The method comprises introducing (i) a circular target- 
ing construct, comprising a first recombination site and the 
polynucleotide sequence of interest, and (ii) a site-specific 
recombinase into the eucaryotic cell, wherein the genome of 
the cell comprises a second recombination site native to the 
genome and recombination between the first and second 
recombination sites is facilitated by the site-specific recom- 
binase. The cell is maintained under conditions that allow 
recombination between the first and second recombination 
sites and the recombination is mediated by the site-specific 
recombinase. The result of the recombination is site-specific 
integration of the polynucleotide sequence of interest in the 
genome of the eucaryotic cell. 
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The recombinase may be introduced into the cell before, 
concurrently with, or after introducing the circular targeting 
construct. Further, the circular targeting construct may com- 
prise other useful components, such as a bacterial origin of 
replication and/or a selectable marker. 

In certain embodiments, the recombinase may facilitate 
recombination between two sites designated recombinase- 
mediated-recombination sites (RMRS) and the RMRS com- 
prises a first DNA sequence (RMRS5'), a core region A, and 
a second DNA sequence (RMRS3') in the relative order 
RMRS5'-core region A-RMRS3'. In this embodiment, for 
example, RMRS may be a loxP site or a FRT site and the 
recombinase may be Cre and FLP, respectively. 

In additional embodiments,© the second recombination 
site is a pseudo-RMRS site, and the second recombination 
site comprises a first DNA sequence (attT5'), a core region 
B, and a second DNA sequence (attT3') in the relative order 
attT5'-core region B-attT3', and (ii) the first recombination 
site is a hybrid-recombination site comprising RMRS5'-core 
region B-RMRS3' or attT5'-core region B-attT3'. 

In yet further embodiments, the site-specific recombinase 
is a recombinase encoded by a phage selected from the 
group consisting of <t>C31, TP901-1, and R4. The recombi- 
nase may facilitate recombination between a bacterial 
genomic recombination site (attB) and a phage genomic 
recombination site (attP), and (i) the second recombination 
site may comprise a pseudo-attP site, and (ii) the first 
recombination site may comprise the attB site or (i) the 
second recombination site may comprise a pseudo-attB site, 
and (ii) the first recombination site may comprise the attP 

In another embodiment, (i) attB comprises a first DNA 
sequence (attB5'), a bacterial core region, and a second DNA 
sequence (attB3') in the relative order attB5'-bacterial core 
region-attB3', (ii) attP comprises a first DNA sequence 
(attP5'), a phage core region, and a second DNA sequence 
(attP3') in the relative order attP5'-phage core region-attP3', 
and (iii) wherein the recombinase meditates production of 
recombination-product sites that can no longer act as a 
substrate for the recombinase, the recombination-product 
sites comprising the relative order attB5'-recombination- 
product site-attP3' and attP5'-recombination-product site- 
attB3\ 

In particularly preferred embodiments, (i) the second 
recombination site is a pseudo-attP site, the second recom- 
bination site comprises a first DNA sequence (attT5'), a core 
region B, and a second DNA sequence (attT3') in the relative 
order attT5'-core region B-attT3', (ii) the first recombination 
site is an attB site comprising attB5'-bacterial core region- 
attB3', and (iii) wherein the recombinase meditates produc- 
tion of recombination-product sites that can no longer act as 
a substrate for the recombinase, the recombination-product 
sites comprising the relative order attT5'-recombination- 
product site-attB3'{polynucleotide of interest}attB5'- 
recombination-product site-attT3'. Alternatively, (i) the sec- 
ond recombination site is a pseudo-attB site, and the second 
recombination site comprises a first DNA sequence (attT5'), 
a core region B, and a second DNA sequence (attT3') in the 
relative order attT5'-core region B-attT3', (ii) the first recom- 
bination site is an attP site comprising attP5'-bacterial core 
region-attP3', and (iii) wherein the recombinase meditates 
production of recombination-product sites that can no longer 
act as a substrate for the recombinase, the recombination- 
product sites comprising the relative order attT5'- 
recombination-product site-attP3'{polynucleotide of 
interest} attPS'-recombination-product site-attT3'. 
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In yet further embodiments, the site-specific recombinase 
is introduced into the cell as a polypeptide. In alternative 
embodiments, the site-specific recombinase in introduced 
into the cell as a polynucleotide encoding the recombinase 

5 and an expression cassette, optionally carried on a transient 
expression vector, comprises the polynucleotide encoding 
the recombinase. 

In another embodiment, the invention is directed to a 
vector for site-specific integration of a polynucleotide 

10 sequence into the genome of a eucaryotic cell. The vector 
comprises (i) a circular backbone vector, (ii) a polynucle- 
otide of interest operably linked to a eucaryotic promoter, 
and (iii) a first recombination site, wherein the genome of 
the cell comprises a second recombination site native to the 

15 genome and recombination between the first and second 
recombination sites is facilitated by a site-specific recombi- 

In certain embodiments, the recombinase normally facili- 
tates recombination between a bacterial genomic recombi- 
20 nation site (attB) and a phage genomic recombination site 
(attP) and the first recombination site may be either attB or 
attP. 

In still another embodiment, the invention is directed to a 
25 kit for site-specific integration of a polynucleotide sequence 
into the genome of a eucaryotic cell. The kit comprises, (i) 
a vector as described above and (ii) a site-specific recom- 
binase. 

In another embodiment, the invention is directed to a 

30 eucaryotic cell having a modified genome. The modified 
genome comprises an integrated polynucleotide sequence of 
interest whose integration was mediated by a recombinase 
and wherein the integration was into a recombination site 
native to the eucaryotic cell genome and the integration 

35 created a recombination-product site comprising the poly- 
nucleotide sequence. 

In certain embodiments, the recombination-site product 
comprises the components attT5'-recombination-product 
site-attB3' and attB5'-recombination-product site-attT3', 

40 wherein (i) the native recombination site is a pseudo-attP 
site, and the native recombination site comprises a first DNA 
sequence (attT5% a core region B, and a second DNA 
sequence (attT3') in the relative order attT5'-core region 
B-attT3', (ii) the integrated polynucleotide sequence com- 

45 prises a first recombination site comprising an attB site 
comprising atfB5'-bacterial core region-attB3', and (iii) 
wherein the recombinase meditates production of 
recombination-product sites that can no longer act as a 
substrate for the recombinase, the recombination-product 

50 sites comprising the relative order attT5'-recombination- 
product site-attB3'{polynucleotide of interest} attB5'- 
recombination-product site-attT3'. Alternatively, the 
recombination-site product comprises the components 
attT5'-recombination-product site-attB3' and attB5'- 

55 recombination-product site-attT3', wherein (i) the native 
recombination site is a pseudo-attB site, and the native 
recombination site comprises a first DNA sequence (attT5'), 
a core region B, and a second DNA sequence (attT3') in the 
relative order attT5'-core region B-attT3', (ii) the integrated 

60 polynucleotide sequence comprises a first recombination 
site comprising an attP site comprising attP5'-phage core 
region-attP3', and (iii) wherein the recombinase meditates 
production of recombination-product sites that can no longer 
act as a substrate for the recombinase, the recombination- 

65 product sites comprising the relative order attT5'- 
recombination-product site-attP3'{polynucleotide of 
interest}attP5'-recombination-product site-attT3'. 
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In further embodiments, the subject invention is directed site (open triangle), as well as the gene for tetracycline 

to transgenic plants and animals comprising at least one cell resistance. Similar control plasmids bearing either no lox 

as described above, as well as methods of producing the site or the wild-type loxP site were also constructed. 

same pDh7q21 (upper right) was the donor plasmid for integration 

In yet other embodiments, the invention is directed to 5 and mcluded a lox site (open triangle, lox^core) comprising 

, n , . ■ j- j ■ u- . • At u the 8-bp core from tplox h7q21 and the wild-type loxP 

methods of treating a disorder in a subject in need of such The , J mid ^ carried two wM ™ lox p 

treatment. The method comprises site -specifically integral- ^ (dafk trf les) In the ence of Cre> the lasmid 

ing a polynucleotide sequence of interest in a genome of at ofigin of replication and the ampicmin resistance gene are 

least one cell of the subject, wherein the polynucleotide exc ised, resulting in integrants that do not have two plasmid 

facilitates production of a product that treats the disorder in " origins ^ excised by . product ^ shown in the lower right 

the subject. The site-specific integration may be carried out The s it e -specific integration product, bearing lacZ flanked 

in vivo in the subject, or ex vivo in cells and the cells are (, y hybrid lox sites (shaded triangles) in a tetracycline 

then introduced into the subject. resistant backbone, is shown at lower left. Parallel donor 

A further embodiment of the invention comprises cells, plasmids having, in place of iplox h7q21, either no lox site 

tissues, transgenic animals and/or plants whose genomes 15 or only wild-type loxP sites, were also constructed, 

have been modified using the methods described herein. FIGS. 4A through 4E are schematic diagrams of repre- 

In another aspect, the present invention provides a method « utative plasmids used in demonstrating function of the 

of modifying a genome of a cell. In the method, an attB or +P 1 grase, as described m the examples. FIG. 4A 

„tT V? ■ ■ . iL c ii shows plasmid pint, for expression of <bC31 integrase in E. 

an attP recombination site is into the genome of a cell, 2Q ^ 4B ^ ^ ^ ^ for S ression of 

wherein (i) the recombination site is recognized by a ^ Jn mammal £ n cells f FIG . 4C ^4 plasmid 

recombinase, and (u) the cell normally does not comprise pBCPB+; an i ntram olecular integration assay vector; FIG. 

the attB or attP site. The vectors described herein and above 4D shows plasmid p 220KattBfull, an EBV vector bearing 

are useful in the practice of this aspect of the invention. In attB> the target for integration events; FIG. 4E shows plas- 

a preferred embodiment, the cell that is being modified is a 25 m j d p TSAD, the donor for integration events, bearing attP. 

eucaryotic cell. KarT, Amp*, Chlor* and Hyg* are genes for resistance to 

In yet another aspect, the present invention provides kanamycin, ampicillin, chloramphenicol, and hygromycin, 

expression cassettes, comprising a polynucleotide encoding respectively. 

a site-specific recombinase, wherein (i) the recombinase is FIG. 5 shows along the vertical axis the percent recom- 

encoded by a phage (typically selected from the group 30 bination obtained in the intramolecular integration assay in 

consisting of (j)C31, TP901-1, and R4) and the recombinase E. coti, described in Example 6, when various shortened 

is operably linked to a eucaryotic promoter. The vectors versions of (|>C31 attB (left) and attP (right) were tested. The 

described herein and above are useful in the practice of this name of each site tested corresponds to the length of the art 

aspect of the invention site in basepairs. The A and B of B33 indicate sites where the 

, , ' . „ , . . reduction of the site length from 34-bp to 33-bp occurred at 

These and other embodiments of the present invention 35 (he kft Qr right endf . of ^ ^ respectively . Similar 

will readily occur to those of ordinary skill in the art m view nomenclature is used for P39A and P39B. Full refers to the 

of the disclosure herein. full length attB. 

RRTFF UFSCRTPTTON OF THF FIGURES FIG - 6 shoWS the perCent recombmation obtained in the 

BRIEF DESCRIPTION OF THE FIGURES intramolecular integration assay performed in E. coli when 

FIGS. 1A through 1C are schematics of representative 40 various substitutions in the attB and/or attP cores were made, 

plasmids useful in evaluating the efficiency of pseudo-lox ^ first column shows the recombination frequency when 

recombination sequences. FIG. 1A shows an unmodified »«» bears the mutan j se f ence ^own and attP remains 

plasmid containing a gene for ampicillin resistance and a wild-type, the second column shows the recombination 

gene for |3-galacto1idaie expression (lacZ) under control of whe h n attD bears the mutant sequence, while the 

5. . , r r^Xis ™- in i. ,l 45 third column shows the recombination frequency when both 

the CMV promoter (pLCGl). FIG. IB shows the same and b ^ ^ mutant ^ ^ ^ nd=Qot 

plasmid with wild^ype loxP sequences flanking the lacZ done M ^ fi ^ most changes in the core 

gene (pWTLox 2 ). FIG. 1C shows the plasmid with the ^lox . Qn are not ^ tolerated . 

h7q21 pseudo-lox recombination sequence on one side of FIG ? showg the results of a bimolecular integration 

lacZ and a lox sequence with wild-type palindromes and a so performed in human cells as described in the 

pseudo-lox core on the other side (p^loxh7q21). examples. Results are shown for human cells carrying three 

FIG. ID shows the DNA sequences of the lox sites from ggv plasmids, p220K, a negative control lacking attB; 

pWTLox 2 (top line of FIG. ID, WT LoxP (SEQ ID NO:20) p 220KattB35, which carries the minimally sized attB; and 

and plasmid ptj)loxh7q21 (bottom lines of FIG. ID, p 220KattBfull, carrying the full-sized attB. Integration fre- 

il)Loxh7q21) SEQ ID NO:21) and 0|jCoreh7q21 (SEQ ID J5 que ncies are shown for experiments when no DNA was 

NO:22) transfected, when either the integrase expression plasmid 

FIG. 2 shows the results of an excision assay performed pCMVInt or the attP-bearing plasmid pTSAD alone was 

in human cells as described in the examples. Each of the transfected, or when both pCMVInt and pTSAD together 

tested plasmids was transfected into human 293 cells along were transfected. Only the latter conditions, in the presence 

with a Cre expression plasmid. After 72 hours, DNA was 60 of a plasmid bearing attB, lead to integration events. Inte- 

transformed into E. coli and recombinants scored. The gration frequencies were corrected for transfection fre- 

transient excision frequency is expressed as a percentage, quency to give the accurate corrected integration frequencies 

where the value for pWTLox 2 is set at 100%. in the last column. p220KattBfull produced the highest 

FIG. 3 is a diagram of plasmids used in a transient integration frequency at 7.5%. 

integration assay performed in human cells as described in 65 FIGS. 8A through 8B show pseudo-loxP sequences iden- 

the examples. pRh7q21 (upper left) was the recipient for an tified by computer search, as described in the Examples. The 

integration event and included the chromosomal \plox h7q21 core sequences are shown in boldface type. 
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DETAILED DESCRIPTION OF THE 
INVENTION 

Throughout this application, various publications, 
patents, and published patent applications are referred to by 
an identifying citation. The disclosures of these publications, 
patents, and published patent specifications referenced in 
this application are hereby incorporated by reference into the 
present disclosure to more fully describe the state of the art 
to which this invention pertains. 

The practice of the present invention will employ, unless 
otherwise indicated, conventional techniques of molecular 
biology, microbiology, cell biology and recombinant DNA, 
which are within the skill of the art. See, e.g., Sambrook, 
Fritsch, and Maniatis, MOLECULAR CLONING: A 
LABORATORY MANUAL, 2nd edition (1989); CUR- 
RENT PROTOCOLS IN MOLECULAR BIOLOGY, (F. M. 
Ausubel et al. eds., 1987); the series METHODS IN ENZY- 
MOLOGY (Academic Press, Inc.); PCR 2: A PRACTICAL 
APPROACH (M. J. McPherson, B. D. Hames and G. R. 
Taylor eds., 1995) and ANIMAL CELL CULTURE (R. I. 
Freshney. Ed., 1987). 

All publications, patents and patent applications cited 
herein, whether supra or infra, are hereby incorporated by 
reference in their entirety. 

As used in this specification and the appended claims, the 
singular forms "a," "an" and "the" include plural references 
unless the content clearly dictates otherwise. Thus, for 
example, reference to "an antigen" includes a mixture of two 
or more such agents. 

Definitions 

"Recombinase" as used herein refers to a group of 
enzymes that can facilitate site specific recombination 
between defined sites, where the sites are physically sepa- 
rated on a single DNA molecule or where the sites reside on 
separate DNA molecules. The DNA sequences of the defined 
recombination sites are not necessarily identical. Within this 
group are several subfamilies including "Integrase" 
(including, for example, Cre and X integrase) and 
"Resolvase/Invertase" (including, for example, ct)C31 
integrase, R4 integrase, and TP-901 integrase). 

By "wild-type recombination site (RS/WT)" is meant a 
recombination site normally used by an integrase or recom- 
binase. For example, X is a temperate bacteriophage that 
infects E. coli. The phage has one attachment site for 
recombination (attP) and the E. coli bacterial genome has an 
attachment site for recombination (attB). Both of these sites 
are wild -type recombination sites for X integrase. In the 
context of the present invention, wild-type recombination 
sites occur in the homologous phage/bacteria system. 
Accordingly, wild-type recombination sites can be derived 
from the homologous system and associated with heterolo- 
gous sequences, for example, the AU B site can be placed in 
other systems to act as a substrate for the integrase. 

By "pseudo-recombination site (RS/P)" is meant a site at 
which recombinase can facilitate recombination even 
though the site may not have a sequence identical to the 
sequence of its wild-type recombination site. A pseudo- 
recombination site is typically found in an organism heter- 
ologous to the native phage/bacterial system. For example, 
a <(>C31 integrase and vector carrying a (j>C31 wild-type 
recombination site can be placed into a eucaryotic cell. The 
wild-type recombination sequence aligns itself with a 
sequence in the eucaryotic cell genome and the integrase 
facilitates a recombination event. When the sequence from 
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the genomic site, in the eucaryotic cell, where the integration 
of the vector took place (via a recombination event between 
the wild-type recombination site in the vector and the 
genome) is examined, the sequence at the genomic site 

5 typically has some identity to but may not be identical with 
the wild-type bacterial genome recombination site. The 
recombination site in the eucaryotic cell is considered to be 
a pseudo-recombination site at least because the eucaryotic 
cell is heterologous to the normal phage/bacterial cell sys- 

30 tem. The size of the pseudo-recombination site can be 
determined through the use of a variety of methods 
including, but not limited to, (i) sequence alignment 
comparisons, (ii) secondary structural comparisons, (iii) 
deletion or point mutation analysis to find the functional 

15 limits of the pseudo-recombination site, and (iv) combina- 
tions of the foregoing. Pseudo-recombination sites typically 
occur naturally in the genomes of eucaryotic cells (i.e., the 
sites are native to the genome) and are functionally identi- 
fied as described herein (e.g., see Examples). 

20 By "hybrid-recombination site (RS/H)" as used herein 
refers to a recombination site constructed from portions of 
wild-type and/or pseudo-recombination sites. As an 
example, a wild-type recombination site may have a short, 
core region flanked by palindromes. In one embodiment of 

25 a "hybrid-recombination site" the short, core region 
sequence of the hybrid-recombination site matches a core 
sequence of a pseudo-recombination site and the palin- 
dromes of the hybrid-recombination site match the wild-type 
recombination site. In an alternative embodiment, the 

30 hybrid-recombination site may be comprised of flanking 
sites derived from a pseudo-recombination site and a core 
region derived from a wild-type recombination site. Other 
combinations of such hybrid-recombination sites will be 
evident to those having ordinary skill in the art, in view of 

35 the teachings of the present specification. 

A recombination site "native" to the genome, as used 
herein, means a recombination site that occurs naturally in 
the genome of a cell (i.e., the sites are not introduced into the 
genome, for example, by recombinant means.) 

By "nucleic acid construct" it is meant a nucleic acid 
sequence that has been constructed to comprise one or more 
functional units not found together in nature. Examples 
include circular, double-stranded, extrachromosomal DNA 

45 molecules (plasmids), cosmids (plasmids containing COS 
sequences from lambda phage), viral genomes comprising 
non-native nucleic acid sequences, and the like. 

By "nucleic acid fragment of interest" it is meant any 
nucleic acid fragment that one wishes to insert into a 

50 genome. Suitable examples of nucleic acid fragments of 
interest include therapeutic genes, marker genes, control 
regions, trait-producing fragments, and the like. 

"Therapeutic genes" are those nucleic acid sequences 
which encode molecules that provide some therapeutic 

55 benefit to the host, including proteins, functional RNAs 
(antisense, hammerhead ribozymes), and the like. One well 
known example is the cystic fibrosis transmembrane con- 
ductance regulator (CFTR) gene. The primary physiological 
defect in cystic fibrosis is the failure of electrogenic chloride 

60 ion secretion across the epithelia of many organs, including 
the lungs. One of the most dangerous aspects of the disorder 
is the cycle of recurrent airway infections which gradually 
destroy lung function resulting in premature death. Cystic 
fibrosis is caused by a variety of mutations in the CFTR 

65 gene. Since the problems arising in cystic fibrosis result 
from mutations in a single gene, the possibility exists that 
the introduction of a normal copy of the gene into the lung 
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epithelia could provide a treatment for the disease, or effect 
a cure if the gene transfer was permanent. 

Other disorders resulting from mutations in a single gene 
(known as monogenic disorders) include alpha-l-antitrypsin 
deficiency, chromic granulomatous disease, familial 
hypercholesterolemia, Fanconi anemia, Gaucher disease, 
Hunter syndrome, ornithine transcarbamylase deficiency, 
purine nucleoside phosphorylase deficiency, severe com- 
bined immunodeficiency disease (SCID)-ADA, X-linked 
SCID, hemophilia, and the like. 

Therapeutic benefit in other disorders may also result 
from the addition of a protein-encoding therapeutic nucleic 
acid. For example, addition of a nucleic acid encoding an 
immunomodulating protein such as interleukin-2 may be of 
therapeutic benefit for patients suffering from different types 
of cancer. 

A nucleic acid fragment of interest may additionally be a 
"marker nucleic acid" or "marker polypeptide". Marker 
genes encode proteins which can be easily detected in 
transformed cells and are, therefore, useful in the study of 
those cells. Marker genes are being used in bone marrow 
transplantation studies, for example, to investigate the biol- 
ogy of marrow reconstitution and the mechanism of relapse 
in patients. Examples of suitable marker genes include 
beta — galactosidase, green or yellow fluorescent proteins, 
chloramphenicol acetyl transferase, luciferase, and the like. 

A nucleic acid fragment of interest may additionally be a 
control region. The term "control region" or "control ele- 
ment" includes all nucleic acid components which are 
operably linked to a DNA fragment and involved in the 
expression of a protein or RNA therefrom. An operable 
linkage is a linkage in which the regulatory DNA fragments 
and the DNA sought to be expressed are connected in such 
a way as to permit coding sequence (the nucleic acids 
encoding the amino acid sequence of a protein) expression. 
The precise nature of the regulatory regions needed for 
coding sequence expression may vary from organism to 
organism, but will in general include a promoter region that, 
in prokaryotes, contains both the promoter (which directs 
the initiation of RNA transcription) as well as the DNA that, 
when transcribed into RNA, will signal synthesis initiation. 
Such regions will normally include those 5' noncoding 
sequences involved with initiation of transcription and 
translation, such as the enhancer, TATA box, capping 
sequence, CAAT sequence, and the like. 

Under some circumstances, the native genome sought to 
be modified contains a functional coding sequence but lacks 
the ability to control the expression of the sequence. In such 
cases it would be of benefit to modify the genome by the 
insertion of control region(s). Such sequences include any 
sequence that functions to modulate replication, transcrip- 
tional or translational regulation, and the like. Examples 
include promoters, signal sequences, propeptide sequences, 
transcription terminators, polyadenylation sequences, 
enhancer sequences, attenuatory sequences, intron splice 
site sequences, and the like. 

A nucleic acid fragment of interest may additionally be a 
trait-producing sequence, by which it is meant a sequence 
conferring some non-native trait upon the organism or cell 
in which the protein encoded by the trait-producing 
sequence is expressed. The term "non-native" when used in 
the context of a trait-producing sequence means that the trait 
produced is different than one would find in an unmodified 
organism which can mean that the organism produces high 
amounts of a natural substance in comparison to an unmodi- 
fied organism, or produces a non-natural substance. For 
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example, the genome of a crop plant, such as corn, can be 
modified to produce higher amounts of an essential amino 
acid, thus creating a plant of higher nutritional quality, or 
could be modified to produce proteins not normally pro- 
5 duced in plants, such as antibodies. (See U.S. Pat. No. 
5,202,422 (issued Apr. 13, 1993); U.S. Pat. No. 5,639,947 
(Jun. 17, 1997).) Likewise, the genome of industrially 
important microorganisms can be modified to make them 
more useful such as by inserting new metabolic pathways 
with the aim of producing novel metabolites or improving 
both new and existing processes such as the production of 
antibiotics and industrial enzymes. Other useful traits 
include herbicide resistance, antibiotic resistance, disease 
resistance, resistance to adverse environmental conditions 
(e.g., temperature, pH, salt, drought), and the like. 

Methods of transforming cells are well known in the art. 
By "transformed" it is meant a heritable alteration in a cell 
resulting from the uptake of foreign DNA. Suitable methods 
include viral infection, transfection, conjugation, protoplast 
20 fusion, electroporation, particle gun technology, calcium 
phosphate precipitation, direct microinjection, and the like. 
The choice of method is generally dependent on the type of 
cell being transformed and the circumstances under which 
the transformation is taking place (i.e. in vitro, ex viva, or in 
25 vivo). A general discussion of these methods can be found 
in Ausubel, et al, Short Protocols in Molecular Biology, 3rd 
ed., Wiley & Sons, 1995. 

The terms "nucleic acid molecule" and "polynucleotide" 
are used interchangeably and refer to a polymeric form of 
30 nucleotides of any length, either deoxyribonucleotides or 
ribonucleotides, or analogs thereof. Polynucleotides may 
have any three-dimensional structure, and may perform any 
function, known or unknown. Non-limiting examples of 
polynucleotides include a gene, a gene fragment, exons, 
35 introns, messenger RNA (mRNA), transfer RNA, ribosomal 
RNA, ribozymes, cDNA, recombinant polynucleotides, 
branched polynucleotides, plasmids, vectors, isolated DNA 
of any sequence, isolated RNA of any sequence, nucleic acid 
probes, and primers. 
40 A polynucleotide is typically composed of a specific 
sequence of four nucleotide bases: adenine (A); cytosine 
(C); guanine (G); and thymine (T) (uracil (U) for thymine 
(T) when the polynucleotide is RNA). Thus, the term 
polynucleotide sequence is the alphabetical representation 
45 of a polynucleotide molecule. This alphabetical representa- 
tion can be input into databases in a computer having a 
central processing unit and used for bioinformatics applica- 
tions such as functional genomics and homology searching. 
A "coding sequence" or a sequence which "encodes" a 
50 selected polypeptide, is a nucleic acid molecule which is 
transcribed (in the case of DNA) and translated (in the case 
of mRNA) into a polypeptide, for example, in vivo when 
placed under the control of appropriate regulatory sequences 
(or "control elements"). The boundaries of the coding 
55 sequence are typically determined by a start codon at the 5' 
(amino) terminus and a translation stop codon at the 3' 
(carboxy) terminus. A coding sequence can include, but is 
not limited to, cDNA from viral, procaryotic or eucaryotic 
mRNA, genomic DNA sequences from viral or procaryotic 
60 DNA, and even synthetic DNA sequences. A transcription 
termination sequence may be located 3' to the coding 
sequence. Other "control elements" may also be associated 
with a coding sequence. A DNA sequence encoding a 
polypeptide can be optimized for expression in a selected 
65 cell by using the codons preferred by the selected cell to 
represent the DNA copy of the desired polypeptide coding 
sequence. "Encoded by" refers to a nucleic acid sequence 
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which codes for a polypeptide sequence, wherein the 
polypeptide sequence or a portion thereof contains an amino 
acid sequence of at least 3 to 5 amino acids, more preferably 
at least 8 to 10 amino acids, and even more preferably at 
least 15 to 20 amino acids from a polypeptide encoded by 
the nucleic acid sequence. Also encompassed are polypep- 
tide sequences which are immunologically identifiable with 
a polypeptide encoded by the sequence. 

"Operably linked" refers to an arrangement of elements 
wherein the components so described are configured so as to 
perform their usual function. Thus, a given promoter that is 
operably linked to a coding sequence (e.g., a reporter 
expression cassette) is capable of effecting the expression of 
the coding sequence when the proper enzymes are present. 
The promoter or other control elements need not be con- 
tiguous with the coding sequence, so long as they function 
to direct the expression thereof. For example, intervening 
untranslated yet transcribed sequences can be present 
between the promoter sequence and the coding sequence and 
the promoter sequence can still be considered "operably 
linked" to the coding sequence. 

A "vector" is capable of transferring gene sequences to 
target cells. Typically, "vector construct," "expression 
vector," and "gene transfer vector," mean any nucleic acid 
construct capable of directing the expression of a gene of 
interest and which can transfer gene sequences to target 
cells. Thus, the term includes cloning, and expression 
vehicles, as well as integrating vectors. 

An "expression cassette" comprises any nucleic acid 
construct capable of directing the expression of a gene/ 
coding sequence of interest. Such cassettes can be con- 
structed into a "vector," "vector construct," "expression 
vector," or "gene transfer vector," in order to transfer the 
expression cassette into target cells. Thus, the term includes 
cloning and expression vehicles, as well as viral vectors. 

Techniques for determining nucleic acid and amino acid 
"sequence identity" also are known in the art. Typically, 
such techniques include determining the nucleotide 
sequence of the mRNA for a gene and/or determining the 
amino acid sequence encoded thereby, and comparing these 
sequences to a second nucleotide or amino acid sequence. In 
general, "identity" refers to an exact nucleotide-to- 
nucleotide or amino acid-to-amino acid correspondence of 
two polynucleotides or polypeptide sequences, respectively. 
Two or more sequences (polynucleotide or amino acid) can 
be compared by determining their "percent identity." The 
percent identity of two sequences, whether nucleic acid or 
amino acid sequences, is the number of exact matches 
between two aligned sequences divided by the length of the 
shorter sequences and multiplied by 100. An approximate 
alignment for nucleic acid sequences is provided by the local 
homology algorithm of Smith and Waterman, Advances in 
Applied Mathematics 2:482-489 (1981). This algorithm can 
be applied to amino acid sequences by using the scoring 
matrix developed by Dayhoff, Atlas of Protein Sequences 
and Structure, M. O. Dayhoff ed., 5 suppl. 3:353-358, 
National Biomedical Research Foundation, Washington, 
D.C., USA, and normalized by Gribskov, Nucl. Acids Res. 
14(6): 6745-6763 (1986). An exemplary implementation of 
this algorithm to determine percent identity of a sequence is 
provided by the Genetics Computer Group (Madison, Wis.) 
in the "BestFit" utility application. The default parameters 
for this method are described in the Wisconsin Sequence 
Analysis Package Program Manual, Version 8 (1995) 
(available from Genetics Computer Group, Madison, Wis.). 
A preferred method of establishing percent identity in the 
context of the present invention is to use the MPSRCH 
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package of programs copyrighted by the University of 
Edinburgh, developed by John F. Collins and Shane S. 
Sturrok, and distributed by IntelliGenetics, Inc. (Mountain 
View, Calif.). From this suite of packages the Smith- 

5 Waterman algorithm can be employed where default param- 
eters are used for the scoring table (for example, gap open 
penalty of 12, gap extension penalty of one, and a gap of 
six). From the data generated the "Match" value reflects 
"sequence identity." Other suitable programs for calculating 

10 the percent identity or similarity between sequences are 
generally known in the art, for example, another alignment 
program is BLAST, used with default parameters. For 
example, BLASTN and BLASTP can be used using the 
following default parameters: genetic code=standard; filter= 

15 none; strand=both; cutoff=60; expect=10; Matrix= 
BLOSUM62; Descriptions=50 sequences; sort by=HIGH 
SCORE; Databases=non-redundant, GenBank+EMBL+ 
DDBJ+PDB+GenBank CDS translations+Swiss protein+ 
Spupdate+PIR. Details of these programs can be found at the 

20 WebSite of NCBI/NLM. 

Alternatively, homology can be determined by hybridiza- 
tion of polynucleotides under conditions that form stable 
duplexes between homologous regions, followed by diges- 
tion with single-stranded-specific nuclease(s), and size 

25 determination of the digested fragments. Two DNA, or two 
polypeptide sequences are "substantially homologous" to 
each other when the sequences exhibit at least about 
80%-85%, preferably at least about 85%-90%, more pref- 
erably at least about 90%-95%, and most preferably at least 

30 about 95%-98% sequence identity over a defined length of 
the molecules, as determined using the methods above. As 
used herein, substantially homologous also refers to 
sequences showing complete identity to the specified DNA 
or polypeptide sequence. DNA sequences that are substan- 

35 tially homologous can be identified in a Southern hybrid- 
ization experiment under, for example, stringent conditions, 
as defined for that particular system. Defining appropriate 
hybridization conditions is within the skill of the art. See, 
e.g., Sambrook et al., supra; DNA Cloning, supra; Nucleic 

40 Acid Hybridization, supra. 

Two nucleic acid fragments are considered to "selectively 
hybridize" as described herein. The degree of sequence 
identity between two nucleic acid molecules affects the 
efficiency and strength of hybridization events between such 

45 molecules. A partially identical nucleic acid sequence will at 
least partially inhibit a completely identical sequence from 
hybridizing to a target molecule. Inhibition of hybridization 
of the completely identical sequence can be assessed using 
hybridization assays that are well known in the art (e.g., 

so Southern blot, Northern blot, solution hybridization, or the 
like, see Sambrook, et al, Molecular Cloning: A Laboratory 
Manual, Second Edition, (1989) Cold Spring Harbor, N.Y.). 
Such assays can be conducted using varying degrees of 
selectivity, for example, using conditions varying from low 

55 to high stringency. If conditions of low stringency are 
employed, the absence of non-specific binding can be 
assessed using a secondary probe that lacks even a partial 
degree of sequence identity (for example, a probe having 
less than about 30% sequence identity with the target 

60 molecule), such that, in the absence of non-specific binding 
events, the secondary probe will not hybridize to the target. 

When utilizing a hybridization-based detection system, a 
nucleic acid probe is chosen that is complementary to a 
target nucleic acid sequence, and then by selection of 

65 appropriate conditions the probe and the target sequence 
"selectively hybridize," or bind, to each other to form a 
hybrid molecule. A nucleic acid molecule that is capable of 
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hybridizing selectively to a target sequence under "moder- introduced that specifically recognizes the recombination 

ately stringent" typically hybridizes under conditions that sequences under conditions such that the nucleic acid 

allow detection of a target nucleic acid sequence of at least sequence of interest is inserted into the genome via a 

about 10-14 nucleotides in length having at least approxi- recombination event between attT and attD. Alternatively, 

mately 70% sequence identity with the sequence of the 5 the recombinase can be introduced into the cell prior to or 

selected nucleic acid probe. Stringent hybridization condi- concurrent with introduction of the targeting construct trans- 

tions typically allow detection of target nucleic acid formation with the nucleic acid construct, 

sequences of at least about 10-14 nucleotides ir i length The method of the invention is based; in part , on the 

having a sequence identity of greater than about 90-95% discovery that there exist in various genomes specific 

with the sequence of the selected nucleic acid probe. Hybrid- w mcMc acid herein called pseud o-recombination 

ration conditions useful for probe/target hybridization sequences> that may be distinct from wi^ype reC ombina- 

where the probe and target have a specific degree of t ion sequences and that can be recognized by a site-specific 

sequence identity, can be determined as is known in the art recombinase and ^ t0 promote tb e insertion of heterolo- 

(see, for example, Nucleic Acid Hybridization: A Practical gQUS genes or polynucleot i d es into the genome. The inven- 

Approach, editors B. D. Hames and S. J. Higgins, (1985) M tors have identifie d such pseudo-recombination sequences in 

Oxford; Washmgton, D.C.; IRL Press). a variety of organismS; inc i ud ing mammals and plants. 

With respect to stringency conditions for hybridization, it 1.1.0 Recombinases 

is well known in the art that numerous equivalent conditions ' Xwo major families of S it e - S pecific recombinases from 

can be employed to establish a particular stringency by bacteria and unicellular yeasts have been described: the 

varying, for example, the following factors: the length and 20 integrase family i nc i u des Cre, Flp, R, and X integrase 

nature of probe and target sequences, base composition of (Argos, et al., EMBO J. 5:433^140, 1986) and the resolvase/ 

the various sequences, concentrations of salts and other invertase family includes some phage integrases, such as, 

hybridization solution components, the presence or absence those of phages ^31, R4, and TP-901 (Hallet and Sherratt, 

of blocking agents in the hybridization solutions (e.g., FEMS Microbiol. Rev. 21:157-178, 1997). While not wish- 

formamide, dextran sulfate, and polyethylene glycol), 2S mg t0 be bound by descriptions of mechanisms, strand 

hybridization reaction temperature and time parameters, as exchange catalyzed by site specific recombinases typically 

well as, varying wash conditions. The selection of a par- occurs in two steps of ^ c i eavage and ( 2 ) rejoining involv- 

ticular set of hybridization conditions is selected following ing a cova l e nt protein-DNA intermediate formed between 

standard methods in the art (see, for example, Sambrook, et the reC ombinase enzyme and the DNA strand(s). 

al., Molecular Cloning: A Laboratory Manual, Second 30 -p^ nature of the cata i yt i c amino acid residue of the 

Edition, (1989) Cold Spring Harbor, N.Y.) recombinase enzyme and the line of entry of the nucleophile 

A first polynucleotide is "derived from" second poly- can be different for the two recombinase families. For 

nucleotide if it has the same or substantially the same cleavage catalyzed by the invertase/resolvase family, for 

basepair sequence as a region of the second polynucleotide, example, the nucleophile hydroxyl is derived from a serine 

its cDNA, complements thereof, or if it displays sequence 35 an( j the leaving group is the 3'-OH of the deoxyribose. For 

identity as described above. the integrase family, the catalytic residue is, for example, a 

Afirst polypeptide is "derived from" a second polypeptide tyrosine and the leaving group is the 5'-OH. In both recom- 

if it is (i) encoded by a first polynucleotide derived from a binase families, the rejoining step is the reverse of the 

second polynucleotide, or (ii) displays sequence identity to cleavage step. Recombinases particularly useful in the prac- 

the second polypeptides as described above. In the present 40 tice of the invention are those that function in a wide variety 

invention, when a recombinase is "derived from a phage" of cell types, in part because they do not require any host 

the recombinase need not be explicitly produced by the specific factors. Suitable recombinases include Cre, Flp, R, 

phage itself, the phage is simply considered to be the and the integrases of phages <j>C31, TP901-1, R4, and the 

original source of the recombinase and coding sequences like. Some characteristics of the two recombinase families 

thereof. Recombinases can, for example, be produced 45 are discussed below, 

recombinantly or synthetically, by methods known in the art, 1.1.1 Cre-like Recombinases 

or alternatively, recombinases may be purified from phage The recombinase activity of Cre has been studied as a 

infected bacterial cultures. model system for the integrases. Cre is a 38 kD protein 

"Substantially purified" general refers to isolation of a isolated from bacteriophage PI. It catalyzes recombination 
substance (compound, polynucleotide, protein, polypeptide, 50 at a 34 basepair stretch of DNA called loxP. The loxP site has 
polypeptide composition) such that the substance comprises the sequence 5'-ATAACTTCGTATA GCATACAT 
the majority percent of the sample in which it resides. TATACGAAGTTAT-3' (SEQ ID NO:l) consisting of two 
Typically in a sample a substantially purified component thirteen basepair palindromic repeats flanking an eight base- 
comprises 50%, preferably 80%-85%, more preferably pair core sequence. The repeat sequences act as Cre binding 
90-95% of the sample. Techniques for purifying polynucle- 55 sites with the crossover point occurring in the core. Each 
otides and polypeptides of interest are well-known in the art repeat appears to bind one protein molecule wherein the 
and include, for example, ion-exchange chromatography, DNA substrate (one strand) is cleaved and a protein DNA 
affinity chromatography and sedimentation according to intermediate is formed having a 3'-phosphotyrosine linkage 
density. between Cre and the cleaved DNA strand. Crystallography 
1.0.0 The Invention 60 and other studies suggest that four proteins and two loxP 

The invention disclosed herein comprises a method of sites form a synapsed structure in which the DNA resembles 

specifically modifying a genome. In one embodiment of the models of four-way Holliday-junction intermediates, fol- 

method, a cell having a target recombination sequence lowed by the exchange of a second set of strands to resolve 

(designated attT) is transformed with a nucleic acid con- the intermediate into recombinant products (see, Guo, et al, 

struct (a "targeting construct") comprising a second recom- 65 Nature 389:40-46, 1997). The asymmetry of the core region 

bination sequence (designated attD) and one or more poly- is responsible for directionality of the recombination reac- 

nucleotides of interest. Into the same cell a recombinase is tion. If the two recombination sites are repeated in the same 
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orientation, the outcome of strand exchange is integration or recombinases, however, some resolvases do not require host 

excision. If the two sites are placed in the opposite specific accessory factors (Thorpe and Smith, PNAS USA 

orientation, the outcome is inversion of the sequence 95:5505-5510, 1998). 

between the two sites (Yang and Mizuuchi, Structure The process of strand exchange used by the resolvases is 

5:1401-1406, 1997). 5 somewhat different than the process used by Cre. This 

Cre has been shown to be active in a wide variety of process is described but is not intended to be limiting. The 
cellular backgrounds including yeast (Sauer, Mol. Cell. Biol. resolvases usually make cuts close to the center of the 
7:2087-2096, 1987), plants (Albert, et al, Plant J. crossover site, and the top and bottom strand cuts are often 
7:649-659, 1995; Dale and Ow, Gene 91:79-8S, 1990; staggered by 2 basepairs, leaving recessed 5' ends. Aprotein- 
Odell, et al, Mol. Gen. Genet. 223:369-378, 1990) and 10 DNA linkage is formed between phosphodiester from the 5' 
mammals, including both rodent and human cells (van DNA end and a conserved serine residue close to the amino 
Deursen, et al, Proc. Natl. Acad. Sci. USA 92:7376-7380, terminus of the recombinase. As with the Cre-like 
1995; Agah, et al, J. Clin. Invest. 100:169-179, 1997; invertases, two protein units are bound at each crossover 
Baubonis, and Sauer, 21:2025-2029, 1993; Sauer and site, however, no equivalent to the Holiday junction inter- 
Henderson, New Biologist 2:441^149, 1990). As the loxP 15 mediate is formed (see Stark, et al, Trends in Genetics 
site is known only to occur in the PI phage genome, use of 8(12):432^39, 1992, incorporated by reference herein), 
the enzyme in other cell types requires the prior insertion of The nucleic acid sequences recognized as recombination 
a loxP site into the genome, which using currently available sites by a subset of the resolvase family, including some 
technologies is generally a low-frequency and random event phage integrases, differ in several ways from the recombi- 
with all of the drawbacks inherent in such a procedure. The 20 nation site recognized by Cre. The sites used for recognition 
loxP site can be targeted to a specific location by using and recombination of the phage and bacterial DNAs (the 
homologous recombination, but, again, that process occurs native host system) are generally non-identical, although 
at a very low frequency. they typically have a common core region of nucleic acids. 

Several studies have suggested the possibility that an The bacterial sequence is generally called the attB sequence 

exact match of the loxP sequence is not required for Cre- 25 (bacterial attachment) and the phage sequence is called the 

mediated recombination (Sternberg, et al, J. Mol. Biol. attP sequence (phage attachment). Because they are different 

150:487-507, 1981; Sauer, J. Mol. Biol. 223:911-928, 1992; sequences, recombination will result in a stretch of nucleic 

Sauer, Nucleic Acids Research 24:4608-4613, 1996). The acids (called attL or attR for left and right) that is neither an 

efficiency of recombination, however, has generally been attB sequence or an attP sequence, and is probably func- 

three to four orders of magnitude less efficient than wild- 30 tionally unrecognizable as a recombination site to the rel- 

type loxP. Sauer attempted to identify sequences similar to evant enzyme, thus removing the possibility that the enzyme 

loxP in the human genome without success (Sauer, Nucleic will catalyze a second recombination reaction that would 

Acids Research 24:4608-4613, 1996): reverse the first. 

Flp, a recombinase of the integrase family with similar The individual resolvases and the nucleic acid sequences 

properties to Cre has been identified in strains of Saccha- 35 that they recognize have been less well characterized than 

romyces cerevisiae that contain 2^-circle DNA. Flp recog- Cre and Flp, although many of the core sequences have been 

nizes a DNA sequence consisting of two thirteen basepair identified. The core sequences of some of the resolvases 

inverted repeats flanking an eight basepair core sequence useful in the practice of the invention can include, without 

(5'-G AAGTTCCTATAC TTCTAGAA limitation, the following sequences: <j>C31-5'-TTG; TP901- 

GAATAGGAACTTC-3' (SEQ ID NO:2) called FRT. A third 40 l-5'-TCAAT; and R4-5'-GAAGCAGTGGTA. (SEQ ID 

repeat follows at the 3' end in the natural sequence but does NO:3) (See Rausch and Lehmann, NAR 19:5187-5189, 

not appear to be required for recombinase activity. Like Cre, 1991; Shirai, et al, J Bacteriology 173(13):4237-4239, 

Flp is functional in a wide variety of systems including 1991; Crellin and Rood, J Bacteriology 179:5148-5156, 

bacteria (Huang, et al, J Bacteriology 179:6076-6083, 1997; Christiansen, et al, J. Bacteriology 176:1069-1076, 

1997), insects (Golic and Lindquist, Cell 59:499-509, 1989; 45 1994; Brondsted and Hammer, Applied & Environmental 

Golic and Golic, Genetics 144:1693-1711, 1996), plants Microbiology 65:752-758, 1999; all of which are incorpo- 

(Lyznik, et al, Nucleic Acids Res 21:969-975, 1993) and rated by reference herein.) 

mammals. These studies have likewise required that a FRT Several authors have suggested that integrase or resolvase 

sequence be inserted into the genome to be modified. (for example, <j>C31 integrase) can be used to modify bac- 

A related recombinase, known as R, is encoded by the 50 terial genomes, such as, those of E. coli and actinomycetes 

pSRl plasmid of the yeast Zygosaccharomyces rouxii (Mascarenhas and Olson, U.S. Pat. No. 5,470,727; Cox, et 

(Araki, et al., J. Mol. Biol. 182:191-203, 1985, herein al, U.S. Pat. No. 5,190,871). However, there has been no 

incorporated by reference). This recombinase may have suggestion that these enzymes would be useful in the 

properties similar to those described above. modification of non-bacterial genomes. 

In the context of the present invention, when a recombi- 55 1.1.3 Recombination Sites 

nase normally facilitates recombination between two recom- The inventors have discovered native recombination sites 

bination sites and the sites are essentially the same (e.g., existing in the genomes of a variety of organisms, where the 

loxP and Cre), the sites are designated recombinase- native recombination site does not necessarily have a nucle- 

mediated-recombination sites (RMRS). otide sequence identical to the wild-type recombination 

1.1.2 Resolvase/Integrase Recombinases 60 sequences (for a given recombinase); but such native recom- 

Unlike the Cre/X. integrase family of recombinases, mem- bination sites are nonetheless sufficient to promote recom- 
bers of the resolvase subfamily of recombinase enzymes bination meditated by the recombinase. Such recombination 
typically contain an N-terminal catalytic domain having a site sequences are referred to herein as "pseudo- 
high degree (>35%) of sequence homology among the recombination sequences." For a given recombinase, a 
subfamily members (Crellin and Rood, J Bacteriology 179 65 pseudo-recombination sequence is functionally equivalent 
(16):5148-5156, 1997; Christiansen, et al, J. Bacteriology to a wild-type recombination sequence, occurs in an organ- 
178(17):5164-5S173, 1996). Like some of the Cre-type ism other than that in which the recombinase is found in 
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nature, and may have sequence variation relative to the wild 
type recombination sequences. 

In the practice of the present invention, wild-type recom- 
bination sites, pseudo-recombination sites, and hybrid- 
recombination sites can be used in a variety of ways in the 
construction of targeting vectors. Following here are non- 
limiting examples of how these sites may be employed in the 
practice of the present invention. 

Identification of pseudo-recombination sequences can be 
accomplished, for example, by using sequence alignment 
and analysis, where the query sequence is the recombination 
site of interest (for example, a recombinase-mediated- 
recombination site (RMRS; e.g., loxP), or either attB and/or 
attP of a phage/bacterial system). Following here are some 
examples: if a genomic recombination site (generally des- 
ignated attT) is identified using attB, then that attT site is 
said to be a pseudo-attB site; if a genomic recombination site 
is identified using attP, then that attT site is said to be a 
pseudo-attP site; and, if a genomic recombination site is 
identified using an RMRS (e.g., loxP), then that attT site is 
said to be a pseudo-RMRS site (e.g., pseudo-loxP). 

In one aspect of the present invention, the recombinase 
(for example, Cre) recognizes a recombination site having 
the following structure: flanking sequence palindrome — 
core sequence — flanking sequence palindrome. Such recom- 
bination sites typically comprise two approximately 10-20 
base pair stretches having some palindromic character which 
flank an approximately 3-15 base pair core sequence. 

In this aspect of the present invention, the genome of a 
target cell is searched for sequences having sequence iden- 
tity to. the selected recombination site for a given 
recombinase, for example, loxP (Example 1; FIG. 8). The 
cellular target recombination site (attT: in this example, a 
pseudo-loxP site) accordingly has a defined sequence. To 
practice the genome modification method of the present 
invention, a recombination sequence is placed in the target- 
ing vector. This recombination sequence, attD, can take 
many forms but must be capable of participating in site 
specific recombination with the genomic site (attT) where 
the recombination is mediated by the appropriate recombi- 
nase. In this regard, non-limiting examples of attD sites 
include, but are not limited to, the following: attD core 
sequence matches the pseudo-recombination site core 
sequence, flanking sequences in the targeting construct are 
wild-type recombination sequences (this construct repre- 
sents a hybrid-recombination site); or, attD core sequence 
matches the pseudo-recombination site core sequence, 
flanking sequences in the targeting construct match the 
pseudo-recombination site flanking sequences. Further, the 
core sequences between attT and attD are generally essen- 
tially the same and the flanking sequences for attD may be 
combinations of flanking sequences from wild-type and 
pseudo-recombination site sources. 

The recombinase-mediated-recombination site (RMRS) 
of this type of recombinase, for example, Cre and Cre-Iike 
recombinases, can have the following structure: a first DNA 
sequence (RMRS5'), a core region A, and a second DNA 
sequence (RMRS3') in the relative order RMRS5'-core 
region A-RMRS3'. Such recombination sites typically com- 
prise two approximately 10-20 base pair regions having 
palindromic characteristics (e.g., RMRS5' and RMRS3') 
which flank an approximately 3-15 basepair core sequence 
(for example, core region A). In one embodiment, e.g., when 
employing Cre, hybrid-recombination sites may be used 
where the palindromic sequences are derived from a wild- 
type recombination site and the core sequence is derived 
from a pseudo-recombination site. 
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Without being bound to any particular theory or mecha- 
nism of action, when such a nucleic acid construct is 
provided to a cell along with a site-specific recombinase, it 
is possible that the recombinase recognizes and binds to the 

5 flanking sequences of both hybrid-recombination sequence 
and the pseudo-recombination sequence from which the 
basepair core sequence was derived, and catalyzes the 
recombination between the two. 

In one embodiment the attD (in the targeting construct) is 

10 a hybrid-lox sequence comprising two wild-type thirteen 
basepair loxP palindromes flanking a heterologous core 
sequence, where the core sequence corresponds to the core 
sequence of the pseudo-recombination sequence of attT (in 
the cell target). In a second embodiment the attD (in the 

15 targeting construct) is a hybrid-FRT sequence comprising 
two or three wild-type thirteen basepair palindromes flank- 
ing a heterologous core sequence, where the core sequences 
correspond to the core sequence of the pseudo- 
recombination sequence of attT (in the cell target). 

20 Example 2 describes methods for testing whether a puta- 
tive recombination site is functional as a pseudo- 
recombination site for recombination mediated by the 
selected site specific recombinase and also methods for 
assessing the efficiency of recombination. 

25 In a second aspect of the present invention, the recombi- 
nase (for example, <j>C31) recognizes a recombination site 
where sequence of the 5' region of the recombination site can 
differ from the sequence of the 3' region of the recombina- 
tion sequence. For example, for the phage <j>C31 attP (the 

30 phage attachment site), the core region is 5'-TTG-3' the 
flanking sequences on either side are represented here as 
attP5' and attP3', the structure of the attP recombination site 
is, accordingly, attP5'-TTG-attP3'. Correspondingly, for the 
native bacterial genomic target site (attB) the core region is 

35 5'-TTG-3', and the flanking sequences on either side are 
represented here as attB5' and attB3', the structure of the attB 
recombination site is, accordingly, attB5'-TTG-attB3'. After 
a single-site, <j>C31 integrase mediated, recombination event 
takes place the result is the following recombination prod- 

40 uct: attB5'-TTG-attP3'{<|)C31 vector sequences}attP5'-TTG- 
attB3'. Typically, after recombination the post- 
recombination recombination sites are no longer able to act 
as substrate for the <)>C31 recombinase. This results in stable 
integration with little or no recombinase mediated excision. 

45 These structures are represented in a more generic way as 
follows: circular targeting vector comprising the recombi- 
nation site (attD) and a polynucleotide of interest — attD5'- 
core-attD3'; pseudo-recombination site (attT)— attT5'-core- 
attT3'; post recombination structure — attT5'-recombination 

50 product site (e.g., core)-attD3'{polynucleotide sequences of 
interest}attD5'-recombination product site (e.g., core)- 
attT3'. The recombination product site sequence can com- 
prise a core identical to the original core sequence. However, 
the complete post-recombination, recombination sites(for 

55 example, attT5'-recombination product site (e.g., core)- 
attD3') generally no longer provide a usable substrate for the 
recombinase. 

In this aspect, when selecting pseudo-recombination sites 
in a target cell (attT), the genomic sequences of the target 

60 cell can be searched for suitable pseudo-recombination sites 
using either the attP or attB sequences associated with a 
particular recombinase. Functional sizes and the amount of 
heterogeneity that can be tolerated in these recombination 
sequences can be evaluated, for example, as described in 

65 Examples 8 and 9. 

When a pseudo-recombination site is identified using 
either attP or attB search sequences, the other recombination 
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site can be used in the targeting construct. For example, if which are bound by a specific sigma factor and RNA 

attP for a selected recombinase is used to identify a pseudo- polymerase. Eukaryotic promoters are more complex. Most 

recombination site in the target cell genome, then the promoters utilized in expression vectors are transcribed by 

wild-type attB sequence can be used in the targeting con- RNA polymerase II. General transcription factors (GTFS) 

struct. In an alternative example, if attB for a selected 5 first bind specific sequences near the start and then recruit 

recombinase is used to identify a pseudo-recombination site the binding of RNA polymerase II. In addition to these 

in the target cell genome, then the wild-type attP sequence minimal promoter elements, small sequence elements are 

can be used in the targeting construct. recognized specifically by modular DNA-binding/trans- 

The targeting constructs contemplated by the invention activating proteins (e.g. AP-1, SP-1) that regulate the activ- 

may contain additional nucleic acid fragments such as 10 ity of a given promoter. Viral promoters serve the same 

control sequences, marker sequences, selection sequences function as bacterial or eukaryotic promoters and either 

and the like as discussed below. provide a specific RNA polymerase in trans (bacteriophage 

1.2.0 Targeting Constructs and Methods of the Present T7) or recruit cellular factors and RNA polymerase (SV40, 
Invention RSV, CMV). Viral promoters may be preferred as they are 

The present invention also provides means for targeted 15 generally particularly strong promoters, 
insertion of a polynucleotide (or nucleic acid sequence(s)) of Promoters may be, furthermore, either constitutive or 
interest into a genome by, for example, (i) providing a regulatable (i.e., inducible or derepressible). Inducible ele- 
recombinase, wherein the recombinase is capable of facili- ments are DNA sequence elements which act in conjunction 
tating recombination between a first recombination site and with promoters and bind either repressors (e.g. lacO/LAC Iq 
a second recombination site, (ii) providing a targeting con- 20 repressor system in E. coli) or inducers (e.g. gall/GAL4 
struct having a first recombination sequence and a poly- inducer system in yeast). In either case, transcription is 
nucleotide of interest, (iii) introducing the recombinase and virtually "shut off" until the promoter is derepressed or 
the targeting construct into a cell which contains in its induced, at which point transcription is "turned-on." 
nucleic acid the second recombination site, wherein said Examples of constitutive promoters include the int pro- 
introducing is done under conditions that allow the recom- 25 moter of bacteriophage X, the bla promoter of the 
binase to facilitate a recombination event between the first (3-lactamase gene sequence of pBR322, the CATpromoter of 
and second recombination sites. the chloramphenicol acetyl transferase gene sequence of 

Historically, the attachment site in a bacterial genome is pPR325, and the like. Examples of inducible prokaryotic 

designated "attB" and in a corresponding bacteriophage the promoters include the major right and left promoters of 

site is designated "attP'. A recombination site in a cell of 30 bacteriophage (P^ and P^, the trp, reca, lacZ, AraC and gal 

interest is designated herein as "attT". A recombination site': promoters of E. coli, the a-amylase (Ulmanen Ett at., J. 

in a targeting vector is referred to herein as "attD". Bacteriol. 162:176-182, 1985) and the sigma-28-specific 

In one aspect of the present invention, at least one promoters of B. subtilis (Gilman et al., Gene sequence 

pseudo-recombination site for a selected recombinase is 32:11-20(1984)), the promoters of the bacteriophages of 

identified in a target cell of interest (attT). These sites can be 35 Bacillus (Gryczan, In: The Molecular Biology of the Bacilli, 

identified by several methods including searching all known Academic Press, Inc., NY (1982)), Streptomyces promoters 

sequences derived from the cell of interest against a wild- (Ward et at., Mol. Gen. Genet. 203:468-478, 1986), and the 

type recombination site (e.g., attB or attP) for a selected like. Exemplary prokaryotic promoters are reviewed by 

recombinase (e.g., as described in Example 1). The func- Glick (J. Ind. Microtiot. 1:277-282, 1987); Cenatiempo 

tionality of pseudo-recombination sites identified in this way 40 (Biochimie 68:505-516, 1986); and Gottesman (Ann. Rev. 

can then be empirically evaluated following the teachings of Genet. 18:415-442, 1984). 

the present specification to determine their ability to par- Preferred eukaryotic promoters include, but are not lim- 

ticipate in a recombinase-mediated recombination event. ited to, the following: the promoter of the mouse metal- 

1.2.1 Targeting Constructs of the Present Invention lothionein I gene sequence (Hamer et al., J. Mol. Appl. Gen. 
Atargeting construct, to direct integration to this pseudo- 45 1:273-288, 1982); the TK promoter of Herpes virus 

recombination site, would then comprise a recombination (McKnight, Cell 31:355-365, 1982); the SV40 early pro- 
site (attD) wherein the recombinase can facilitate a recom- moter (Benoist et al., Nature (London) 290:304-310, 1981); 
bination event between attT and attD, and a polynucleotide the yeast gall gene sequence promoter (Johnston et al., Proc. 
of interest. Polynucleotides of interest can include, but are Natl. Acad. Sci. (USA) 79:6971-6975, 1982); Silver et al., 
not limited to, expression cassettes encoding polypeptide 50 Proc. Natl. Acad. Sci. (USA) 81:5951-59SS, 1984), the 
products. The targeting constructs are typically circular and CMV promoter, the EF-1 promoter, Ecdysone -responsive 
may also contain selectable markers, an origin of replication, promoters), tetracycline-responsive promoter, and the like, 
and other elements. Targeting constructs of the present Exemplary promoters for use in the present invention are 
invention are typically circular. selected such that they are functional in cell type (and/or 

A variety of expression vectors are suitable for use in the 55 animal or plant) into which they are being introduced, 

practice of the present invention, both for prokaryotic Selection markers are valuable elements in expression 

expression and eukaryotic expression. In general, the tar- vectors as they provide a means to select for growth of only 

geting construct will have one or more of the following those cells that contain a vector. Such markers are of two 

features: a promoter, promoter-enhancer sequences, a selec- types: drug resistance and auxotrophic. A drug resistance 

tion marker sequence, an origin of replication, an inducible 60 marker enables cells to detoxify an exogenously added drug 

element sequence, an epitope — tag sequence, and the like. that would otherwise kill the cell. Auxotrophic markers 

Promoter and promoter-enhancer sequences are DNA allow cells to synthesize an essential component (usually an 

sequences to which RNA polymerase binds and initiates amino acid) while grown in media that lacks that essential 

transcription. The promoter determines the polarity of the component. 

transcript by specifying which strand will be transcribed. 65 Common selectable marker genes include those for resis- 

Bacterial promoters consist of consensus sequences, -35 tance to antibiotics such as ampicillin, tetracycline, 

and -10 nucleotides relative to the transcriptional start, kanamycin, bleomycin, streptomycin, hygromycin, 
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neomycin, Zeocin™, and the like. Selectable auxotrophic The targeting cassettes described herein can be con- 
genes include, for example, hisD, that allows growth in structed utilizing methodologies known in the art of molecu- 
histidine free media in the presence of histidinol. lar biology (see, for example, Ausubel or Maniatis) in view 
A further element useful in an expression vector is an of the teachings of the specification. As described above, the 
origin of replication. Replication origins are unique DNA 5 targeting constructs are assembled by inserting, into a suit- 
segments that contain multiple short repeated sequences that able vector backbone, an attD (recombination site), poly- 
are recognized by multimeric origin-binding proteins and nucleotides encoding sequences of interest operably linked 
that play a key role in assembling DNA replication enzymes to a promoter of interest; and, optionally a sequence encod- 
at the origin site. Suitable origins of replication for use in ing a positive selection marker. 

expression vectors employed herein include E. coli oriC, 10 A preferred method of obtaining polynucleotides, includ- 

colEl plasmid origin, 2/i and ARS (both useful in yeast ing suitable regulatory sequences (e.g., promoters) is PCR. 

systems), sfl, SV40, EBV oriP (useful in mammalian General procedures for PCR are taught in MacPherson et al., 

systems), and the like. PCR: A PRACTICAL APPROACH, (IRL Press at Oxford 

Epitope tags are short peptide sequences that are recog- University Press, (1991)). PCR conditions for each applica- 

nized by epitope specific antibodies. A fusion protein com- 15 tion reaction may be empirically determined. A number of 

prising a recombinant protein and an epitope tag can be parameters influence the success of a reaction. Among these 

simply and easily purified using an antibody bound to a parameters are annealing temperature and time, extension 

chromatography resin. The presence of the epitope tag time, Mg2+ and ATP concentration, pH, and the relative 

furthermore allows the recombinant protein to be detected in concentration of primers, templates and deoxyribonucle- 

subsequent assays, such as Western blots, without having to 20 otides. After amplification, the resulting fragments can be 

produce an antibody specific for the recombinant protein detected by agarose gel electrophoresis followed by visual- 

itself . Examples of commonly used epitope tags include V5, ization with ethidium bromide staining and ultraviolet illu- 

glutathione-S-transferase (GST), hemaglutinin (HA), the mination. 

peptide Phe-His-His-Thr-Thr, chitin binding domain, and The expression cassettes, targeting constructs, vectors, 

the like. 25 recombinases and recombinase-coding sequences of the 

A further useful element in an expression vector is a present invention can be formulated into kits. Components 

multiple cloning site or polylinker. Synthetic DNA encoding of such kits can include, but are not limited to, containers, 

a series of restriction endonuclease recognition sites is instructions, solutions, buffers, disposables, and hardware, 

inserted into a plasmid vector, for example, downstream of 1.2.2 Introducing Recombinases 

the promoter element. These sites are engineered for con- 30 In the methods of the invention a site-specific recombi- 

venient cloning of DNA into the vector at a specific position. nase is introduced into a cell whose genome is to be 

The foregoing elements can be combined to produce modified. Methods of introducing functional proteins into 

expression vectors suitable for use in the methods of the cells are well known in the art. Introduction of purified 

invention. Those of skill in the art would be able to select recombinase protein ensures a transient presence of the 

and combine the elements suitable for use in their particular 35 protein and its function, which is often a preferred embodi- 

system in view of the teachings of the present specification. ment. Alternatively, a gene encoding the recombinase can be 

Suitable prokaryotic vectors include plasmids such as those included in an expression vector used to transform the cell, 

capable of replication in E. coli (for example, pBR322, It is generally preferred that the recombinase be present for 

ColEl, pSClOl, PACYC 184, itVX, PRSET, pBAD only such time as is necessary for insertion of the nucleic 

(Invitrogen, Carlsbad, Calif.) and the like). Such plasmids 40 acid fragments into the genome being modified. Thus, the 

are disclosed by Sambrook (cf. "Molecular Cloning: A lack of permanence associated with most expression vectors 

Laboratory Manual," second edition, edited by Sambrook, is not expected to be detrimental. 

Fritsch, & Maniatis, Cold Spring Harbor Laboratory, The recombinases used in the practice of the present 

(1989)). Bacillus plasmids include pC194, pC221, pT127, invention can be introduced into a target cell before, con- 

and the like, and are disclosed by Gryczan (In: The Molecu- 45 currently with, or after the introduction of a targeting vector, 

lar Biology of the Bacilli, Academic Press, NY (1982), pp. The recombinase can be directly introduced into a cell as a 

307-329). Suitable Streptomyces plasmids include plilOl protein, for example, using liposomes, coated particles, or 

(Kendall et al., J. Bacteriol. 169:4177-4183, 1987), and microinjection. Alternately, a polynucleotide encoding the 

streptomyces bacteriophages such as <|>C31 (Chater et al., In: recombinase can be introduced into the cell using a suitable 

Sixth International Symposium on Actinomycetales 50 expression vector. The targeting vector components 

Biology, Akademiai Kaido, Budapest, Hungary (1986), pp. described above are useful in the construction of expression 

45-54). Pseudomonas plasmids are reviewed by John et al. cassettes containing sequences encoding a recombinase of 

(Rev. Infect. Dis. 8:693-704, 1986), and Izaki (Jpn. J. interest. Expression of the recombinase is typically desired 

Bacteriol. 33:729-742, 1978). to be transient. Accordingly, vectors providing transient 

Suitable eukaryotic plasmids include, for example, BPV, 55 expression of the recombinase are preferred in the practice 

EBV, vaccinia, SV40, 2-micron circle, pcDNA3.1, of the present invention. However, expression of the recom- 

pcDNA3.1/GS, pYES2/GS, pMT, p IND, pIND(Spl), binase can be regulated in other ways, for example, by 

pVgRXR (Invitrogen), and the like, or their derivatives. placing the expression of the recombinase under the control 

Such plasmids are well known in the art (Botstein et al., of a regulatable promoter (i.e., a promoter whose expression 

Miami Wntr. SyTnp. 19:265-274, 1982; Broach, In: "The 60 can be selectively induced or repressed). 

Molecular Biology of the Yeast Saccharomyces: Life Cycle Sequences encoding recombinases useful in the practice 

and Inheritance", Cold Spring Harbor Laboratory, Cold of the present invention are known and include, but are not 

Spring Harbor, N.Y, p. 445-470, 1981; Broach, Cell limited to, the following: Cre— Sternberg, et al., J. Mol. 

28:203-204, 1982; Dilon et at., J. Clin. Hematol. Biol. 187:197-212; (|>C31— Kuhstoss and Rao, J. Mol. Biol. 

Oncol.l0:39-48, 1980; Maniatis, In: Cell Biology: A Com- 65 222:897-908, 1991; TP901-1— Christiansen, et al., J. Bact. 

prehensive Treatise, Vol. 3, Gene Sequence Expression, 178:5164-5173, 1996; R4 — Matsuura, et al., J. Bact. 

Academic Press, NY, pp. 563-608,1980. 178:3374-3376, 1996. 
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Recombinases for use in the practice of the present (such as 32D cells) and their derivatives. Preferred mam- 
invention can be produced recombinantly or purified as malian host cells include nonadherent cells such as CHO, 
previously described. Polypeptides having the desired 32D, and the like. 

recombinase activity can be purified to a desired degree of In addition, plant cells are also available as hosts, and 

purity by methods known in the art of protein ammonium 5 control sequences compatible with plant cells are available, 

sulfate precipitation, purification, including, but not limited such as the cauliflower mosaic virus 35S and 19S, nopaline 

to, size fractionation, affinity chromatography, HPLC, ion synthase promoter and polyadenylation signal sequences, 

exchange chromatography, heparin agarose affinity chroma- and the like. Appropriate transgenic plant cells can be used 

tography (e.g., Thorpe & Smith, Proc. Nat. Acad. Sci. to produce transgenic plants. 

95:5505-5510, 1998.) 10 Another preferred host is an insect cell, for example from 

1.2.3 Cells the Drosophila larvae. Using insect cells as hosts, the 

Cells suitable for modification employing the methods of Drosophila alcohol dehydrogenase promoter can be used 

the invention include both prokaryotic cells and eukaryotic (Rubin, Science 240:1453-1459, 1988). Alternatively, bacu- 

cells, provided that the cell's genome contains a pseudo- lovirus vectors can be engineered to express large amounts 

recombination sequence. Prokaryotic cells are cells that lack 15 of peptide encoded by a desired nucleic acid sequence in 

a defined nucleus. Examples of suitable prokaryotic cells insect cells (Jasny, Science 238:1653, 1987); Miller etal., In: 

include bacterial cells, mycoplasmal cells and archaebacte- Genetic Engineering (1986), Setlow, J. K., et al., eds., 

rial cells. Particularly preferred prokaryotic cells include Plenum, Vol. 8, pp. 277-297). 

those that are useful either in various types of test systems The genetically engineered cells of the invention are 

(discussed in greater detail below) or those that have some 20 additionally useful as tools to screen for substances capable 

industrial utility such as Klebsiella oxytoca (ethanol of modulating the activity of a protein encoded by a nucleic 

production), Clostridium acetobutylicum (butanol acid fragment of interest. Thus, an additional embodiment of 

production), and the like (see Green and Bennet, Biotech & the invention comprises methods of screening comprising 

Bioengineering 58:215-221, 1998; Ingram, et al, Biotech & contacting genetically engineered cells of the invention with 

Bioengineering 58:204-206, 1998). Suitable eukaryotic 25 a test substance and monitoring the cells for a change in cell 

cells include both animal cells (such as from insect, rodent, phenotype, cell proliferation, cell differentiation, enzymatic 

cow, goat, rabbit, sheep, non-human primate, human, and activity of the protein or the interaction between the protein 

the like) and plant cells (such as rice, corn, cotton, tobacco, and a natural binding partner of the protein when compared 

tomato, potato, and the like). Cell types applicable to par- to test cells not contacted with the test substance, 

ticular purposes are discussed in greater detail below. 30 A variety of test substances can be evaluated using the 

Yet another embodiment of the invention comprises iso- genetically engineered cells of the invention including 

lated genetically engineered cells. Suitable cells may be peptides, proteins, antibodies, low molecular weight organic 

prokaryotic or eukaryotic, as discussed above. The geneti- compounds, natural products derived from, for example, 

cally engineered cells of the invention may be unicellular fungal or plant cells, and the like. By "low molecular weight 

organisms or may be derived from multicellular organisms. 35 organic compound" it is, meant a chemical species with a 

By "isolated" in reference to genetically engineered cells molecular weight of generally less than 500-1000. Sources 

derived from multicellular organisms it is meant the cells are of test substances are well known to those of skill in the art. 

outside a living body, whether plant or animal, and in an Various assay methods employing cells are also well 

artificial environment. The use of the term isolated does not known by those skilled in the art. They include, for example, 

imply that the genetically engineered cells are the only cells 40 assays for enzymatic activity (Hirth, et al, U.S. Pat. No. 

present. 5,763,198, issued Jun. 9, 1998), assays for binding of a test 

In one embodiment, the genetically engineered cells of substance to a protein expressed by the genetically engi- 

the invention contain any one of the nucleic acid constructs neered cells, assays for transcriptional activation of a 

of the invention. In a second embodiment, a recombinase reporter gene, and the like. 

that specifically recognizes recombination sequences is 45 Cells modified by the methods of the present invention 

introduced into genetically engineered cells containing one can be maintained under conditions that, for example, (i) 

of the nucleic acid constructs of the invention under condi- keep them alive but do not promote growth, (ii) promote 

tions such that the nucleic acid sequence(s) of interest will growth of the cells, and/or (iii) cause the cells to differentiate 

be inserted into the genome. Thus, the genetically engi- or dedifferentiate. Cell culture conditions are typically per- 

neered cells possess a modified genome. Methods of intra- 50 missive for the action of the recombinase in the cells, 

ducing such a recombinase are well known in the art and are although regulation of the activity of the recombinase may 

discussed above. also be modulated by culture conditions (e.g., raising or 

The genetically engineered cells of the invention can be lowering the temperature at which the cells are cultured), 

employed in a variety of ways. Unicellular organisms can be For a given cell, cell-type, tissue, or organism, culture 

modified to produce commercially valuable substances such 55 conditions are known in the art. 

as recombinant proteins, industrial solvents, industrially 2.0.0 Transgenic Plants and Non-Human Animals 

useful enzymes, and the like. Preferred unicellular organ- In another embodiment, the present invention comprises 

isms include fungi such as yeast (for example, 5. pombe, transgenic plants and nonhuman transgenic animals whose 

Pichia pastoris, S. cerevisiae (such as INVScl), and the like) genomes have been modified by employing the methods and 

Aspergillis, and the like, and bacteria such as Klebsiella, 60 compositions of the invention. Transgenic animals may be 

Streptomyces, and the like. produced employing the methods of the present invention to 

Isolated cells from multicellular organisms can be simi- serve as a model system for the study of various disorders 

larly useful, including insect cells, mammalian cells and and for screening of drugs that modulate such disorders, 

plant cells. Mammalian cells that may be useful include A "transgenic" plant or animal refers to a genetically 

those derived from rodents, primates and the like. They 65 engineered plant or animal, or offspring of genetically 

include HeLa cells, cells of fibroblast origin such as VERO, engineered plants or animals. A transgenic plant or animal 

3T3 or CHOK1, HEK 293 cells or cells of lymphoid origin usually contains material from at least one unrelated 
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organism, such as, from a virus. The term "animal" as used The procedures for manipulation of the rodent embryo 

in the context of transgenic organisms means all species and for microinjection of DNA into the pronucleus of the 

except human. It also includes an individual animal in all zygote are well known to those of ordinary skill in the art 

stages of development, including embryonic and fetal (Hogan, et al., supra). Microinjection procedures for fish, 

stages. Farm animals (e.g., chickens, pigs, goats, sheep, 5 amphibian eggs and birds are detailed in Houdebine and 

cows, horses, rabbits and the like), rodents (such as mice), Chourrout, Experientia 47:897-905, 1991). Other proce- 

and domestic pets (e.g., cats and dogs) are included within dures for introduction of DNA into tissues of animals are 

the scope of the present invention. In a preferred described in U.S. Pat. No. 4,945,050 (Sandfordetal., Jul. 30, 

embodiment, the animal is a mouse or a rat. 1990) 

The term "chimeric" plant or animal is used to refer to 1Q Tot ^ otent or p i uri potent stem cells derived from the inner 

plants or animals in which the heterologous gene is found ^ £ J ffi 

or m which the heterologous gene is expressed in some but , , , . „ / . t , . . , 

not all cells of the plant or animal. manipulated in culture to incorporate nucleic acid sequences 

The term transgenic animal also includes a germ cell line employing invention methods. A transgenic animal can be 

transgenic animal. A "germ cell line transgenic animal" is a produced from such cells through injection into a blastocyst 

transgenic animal in which the genetic information provided 15 that 15 then implanted into a foster mother and allowed to 

by the invention method has been taken up and incorporated come to term. 

into a germ line cell, therefore conferring the ability to Methods for the culturing of stem cells and the subsequent 

transfer the information to offspring. If such offspring, in production of transgenic animals by the introduction of 

fact, possess some or all of that information, then they, too, DNA into stem cells using methods such as electroporation, 

are transgenic animals. 20 calcium phosphate/DNA precipitation, microinjection, lipo- 

Methods of generating transgenic plants and animals are some fusion, retroviral infection, and the like are also are 

known in the art and can be used in combination with the well known to those of ordinary skill in the art. See, for 

teachings of the present application. example, Teratocarcinomas and Embryonic Stem Cells, A 

In one embodiment, a transgenic animal of the present Practical Approach, E. J. Robertson, ed., IRL Press, 1987). 

invention is produced by introducing into a single cell 25 R ev i ews 0 f standard laboratory procedures for microinjec- 

embryo a nucleic acid construct, comprising an attD recom- tion of heterologous DNAs into mammalian (mouse, pig, 

bination site capable of recombining with an attT recombi- rabbit> sheep; goatj cow) fertilized ova incmde: Hog an et al., 

nation site found within the genome of the organism from Manipulating the Mouse Embryo (Cold Spring Harbor Press 

which the cell was derived and a nucleic acid fragment of 19gfi) fort et al 1991 Bio/Technology 9:86; 

interest in a manner such that the nucleic acid fragment of ^ f 1Qg5 CeU 4 ^ rf Gemtic 

interest is stably integrated into the DNA of germ line cells . . . ' . ' _ . ' ,. „ , ' 

of the mature animal and is inherited in normal MendeHan Manipulation of the Early Mammalian Embryo (Cold 

fashion. In this embodiment, the nucleic acid fragment of Spring Harbor Laboratory Press 1985); Hammer et al 1985, 

interest can be any one of the fragment described previously. Nature > 315:68 °; Purcel et al > 1986 ' Sclence - 244:1281; 

Alternatively, the nucleic acid sequence of interest can Wagner et al., U.S. Pat. No. 5,175,385; Knmpenfort et al., 

encode an exogenous product that disrupts or interferes with 35 us - Pat - No - 5,175,384, the respective contents of which are 

expression of an endogenously produced protein of interest, incorporated by reference. 

yielding a transgenic animals with decreased expression of The final phase of the procedure is to inject targeted ES 

the protein of interest. cells into blastocysts and to transfer the blastocysts into 

A variety of methods are available for the production of pseudopregnant females. The resulting chimeric animals are 

transgenic animals. A nucleic acid construct of the invention 40 bred and the offspring are analyzed by Southern blotting to 

can be injected into the pronucleus, or cytoplasm, of a identify individuals that carry the transgene. Procedures for 

fertilized egg before fusion of the male and female the production of non-rodent mammals and other animals 

pronuclei, or injected into the nucleus of an embryonic cell have been discussed by others (see Houdebine and 

(e.g., the nucleus of a two-cell embryo) following the Chourrout, supra; Pursel, et al., Science 244:1281-1288, 

initiation of cell division (Brinster, et al., Proc. Nat. Acad. 45 1989; and Simms, et al., Bio/Technology 6:179-183, 1988). 

Sci. USA 82: 4438, 1985). Embryos can be infected with Animals carrying the transgene can be identified by methods 

viruses, especially retroviruses, modified with an attD well known in the art, e.g., by dot blotting or Southern 

recombination site and a nucleic acid sequence of interest. blotting. 

The cell can further be treated with a site-specific recombi- The term transgenic as used herein additionally includes 

nase as described above to promote integration of the 50 any organism whose genome has been altered by in vitro 

nucleic acid sequence of interest into the genome. manipulation of the early embryo or fertilized egg or by any 

By way of example only, to prepare a transgenic mouse, transgenic technology to induce a specific gene knockout, 

female mice are induced to superovulate. After being The term "gene knockout" as used herein, refers to the 

allowed to mate, the females are sacrificed by C0 2 asphyxi- targeted disruption of a gene in vivo with loss of function 

ation or cervical dislocation and embryos are recovered from 55 that has been achieved by use of the invention vector. In one 

excised oviducts. Surrounding cumulus cells are removed. embodiment, transgenic animals having gene knockouts are 

Pronuclear embryos are then washed and stored until the those in which the target gene has been rendered nonfunc- 

time of injection. Randomly cycling adult female mice are tional by an insertion targeted to the gene to be rendered 

paired with vasectomized males. Recipient females are non-functional by targeting a pseudo-recombination site 

mated at the same time as donor females. Embryos then are 60 located within the gene sequence, 

transferred surgically. The procedure for generating trans- 3.0.0 Gene Therapy and Disorders 

genie rats is similar to that of mice. See Hammer, etal, Cell A further embodiment of the invention comprises a 

63:1099-1112, 1990). Rodents suitable for transgenic method of treating a disorder in a subject in need of such 

experiments can be obtained from standard commercial treatment. In one embodiment of the method, at least one 

sources such as Charles River (Wilmington, Mass.), Taconic 65 cell or cell type (or tissue, etc.) of the subject has a target 

(Germantown, N.Y.), Harlan Sprague Dawley (Indianapolis, recombination sequence (designated attT). This cell(s) is 

Ind.), etc. transformed with a nucleic acid construct (a "targeting 
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construct") comprising a second recombination sequence 
(designated attD) and one or more polynucleotides of inter- 
est (typically a therapeutic gene). Into the same cell a 
recombinase is introduced that specifically recognizes the 
recombination sequences under conditions such that the 
nucleic acid sequence of interest is inserted into the genome 
via a recombination event between attT and attD. Subjects 
treatable using the methods of the invention include both 
humans and non-human animals. Such methods utilize the 
targeting constructs and recombinases of the present inven- 
tion. 

A variety of disorders may be treated by employing the 
method of the invention including monogenic disorders, 
infectious diseases, acquired disorders, cancer, and the like. 
Exemplary monogenic disorders include ADA deficiency, 
cystic fibrosis, familial-hypercholesterolemia, hemophilia, 
chronic ganulomatous disease, Duchenne muscular 
dystrophy, Fanconi anemia, sickle-cell anemia, Gaucher's 
disease, Hunter syndrome, X-linked SCID, and the like. 

Infectious diseases treatable by employing the methods of 
the invention include infection with various types of virus 
including human T-cell lymphotropic virus, influenza virus, 
papilloma virus, hepatitis virus, herpes virus, Epstein-Bar 
virus, immunodeficiency viruses (HIV, and the like), 
cytomegalovirus, and the like. Also included are infections 
with other pathogenic organisms such as Mycobacterium 
Tuberculosis, Mycoplasma pneumoniae, and the like or 
parasites such as Plasmadium falciparum, and the like. 

The term "acquired disorder" as used herein refers to a 
noncongenital disorder. Such disorders are generally con- 
sidered more complex than monogenic disorders and may 
result from inappropriate or unwanted activity of one or 
more genes. Examples of such disorders include peripheral 
artery disease, rheumatoid arthritis, coronary artery disease, 
and the like. 

A particular group of acquired disorders treatable by 
employing the methods of the invention include various 
cancers, including both solid tumors and hematopoietic 
cancers such as leukemias and lymphomas. Solid tumors 
that are treatable utilizing the invention method include 
carcinomas, sarcomas, osteomas, fibrosarcomas, 
chondrosarcomas, and the like. Specific cancers include 
breast cancer, brain cancer, lung cancer (non-small cell and 
small cell), colon cancer, pancreatic cancer, prostate cancer, 
gastric cancer, bladder cancer, kidney cancer, head and neck 
cancer, and the like. 

The suitability of the particular place in the genome is 
dependent in part on the particular disorder being treated. 
For example, if the disorder is a monogenic disorder and the 
desired treatment is the addition of a therapeutic nucleic acid 
encoding a non-mutated form of the nucleic acid thought to 
be the causative agent of the disorder, a suitable place may 
be a region of the genome that does not encode any known 
protein and which allows for a reasonable expression level 
of the added nucleic acid. Methods of identifying suitable 
places in the genome are well known in the art and described 
further in the Examples below. 

The nucleic acid construct useful in this embodiment is 
additionally comprised of one or more nucleic acid frag- 
ments of interest. Preferred nucleic acid fragments of inter- 
est for use in this embodiment are therapeutic genes and/or 
control regions, as previously defined. The choice of nucleic 
acid sequence will depend on the nature of the disorder to be 
treated. For example, a nucleic acid construct intended to 
treat hemophilia B, which is caused by a deficiency of 
coagulation factor IX, may comprise a nucleic acid fragment 
encoding functional factor IX. A nucleic acid construct 
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intended to treat obstructive peripheral artery disease may 
comprise nucleic acid fragments encoding proteins that 
stimulate the growth of new blood vessels, such as, for 
example, vascular endothelial growth factor, platelet- 
5 derived growth factor, and the like. Those of skill in the art 
would readily recognize which nucleic acid fragments of 
interest would be useful in the treatment of a particular 
disorder. 

The nucleic acid construct can be administered to the 

10 subject being treated using a variety of methods. Adminis- 
tration can take place in vivo or ex vivo. By "in vivo," it is 
meant in the living body of an animal. By "ex vivo" it is 
meant that cells or organs are modified outside of the body, 
such cells or organs are typically returned to a living body. 

15 Methods for the therapeutic administration of nucleic acid 
constructs are well known in the art. Nucleic acid constructs 
can be delivered with cationic lipids (Goddard, et al, Gene 
Therapy, 4:1231-1236, 1997; Gorman, et al, Gene Therapy 
4:983-992, 1997; Chadwick, et al, Gene Therapy 

20 4:937-942, 1997; Gokhale, et al, Gene Therapy 
4:1289-1299, 1997; Gao, and Huang, Gene Therapy 
2:710-722, 1995, all of which are incorporated by reference 
herein), using viral vectors (Monahan, et al, Gene Therapy 
4:40-49, 1997; Onodera, et al, Blood 91:30-36, 1998, all of 

25 which are incorporated by reference herein), by uptake of 
"naked DNA", and the like. Techniques well known in the 
art for the transfection of cells (see discussion above) can be 
used for the ex vivo administration of nucleic acid con- 
structs. The exact formulation, route of administration and 

30 dosage can be chosen by the individual physician in view of 
the patient's condition. (See e.g. Fingl et al., 1975, in "The 
Pharmacological Basis of Therapeutics", Ch. 1 pi). 

It should be noted that the attending physician would 
know how to and when to terminate, interrupt, or adjust 

35 administration due to toxicity, to organ dysfunction, and the 
like. Conversely, the attending physician would also know 
how to adjust treatment to higher levels if the clinical 
response were not adequate (precluding toxicity). The mag- 
nitude of an administered dose in the management of the 

40 disorder being treated will vary with the severity of the 
condition to be treated, with the route of administration, and 
the like. The severity of the condition may, for example, be 
evaluated, in part, by standard prognostic evaluation meth- 
ods. Further, the dose and perhaps dose frequency will also 

45 vary according to the age, body weight, and response of the 
individual patient. 

In general at least 1-10% of the cells targeted for genomic 
modification should be modified in the treatment of a 
disorder. Thus, the method and route of administration will 

50 optimally be chosen to modify at least 0.1-1% of the target 
cells per administration. In this way, the number of admin- 
istrations can be held to a minimum in order to increase the 
efficiency and convenience of the treatment. 

Depending on the specific conditions being treated, such 

55 agents may be formulated and administered systemically or 
locally. Techniques for formulation and administration may 
be found in "Remington's Pharmaceutical Sciences," 1990, 
18th ed., Mack Publishing Co., Easton, Pa. Suitable routes 
may include oral, rectal, transdermal, vaginal, transmucosal, 

60 or intestinal administration; parenteral delivery, including 
intramuscular, subcutaneous, intramedullary injections, as 
well as intrathecal, direct intraventricular, intravenous, 
intraperitoneal, intranasal, or intraocular injections, just to 
name a few. 

65 The subject being treated will additionally be adminis- 
tered a recombinase that specifically recognizes the attT and 
attD recombination sequences that are selected for use. The 
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particular recombinase can be administered by including a 
nucleic acid encoding it as part of a nucleic acid construct, 
or as a protein to be taken up by the cells whose genome is 
to be modified. Methods and routes of administration will be 
similar to those described above for administration of a 5 
targeting construct comprising a recombination sequence 
and nucleic acid sequence of interest. The recombinase 
protein is likely to only be required for a limited period of 
time for integration of the nucleic acid sequence of interest. 
Therefore, if introduced as a recombinase gene, the vector 10 
carrying the recombinase gene will lack sequences mediat- 
ing prolonged retention. For example, conventional plasmid 
DNA decays rapidly in most mammalian cells. The recom- 
binase gene may also be equipped with gene expression 
sequences that limit its expression. For example, an indue- 15 
ible promoter can be used, so that recombinase expression 
can be temporally Limited by limited exposure to the induc- 
ing agent. One such exemplary group of promoters are 
tetracycline -responsive promoters the expression of which 
can be regulated using tetracycline or doxycycline. 20 

The invention will now be described in greater detail by 
reference to the following non-limiting Examples. 

EXAMPLES 
Example 1 25 
Identification of Pseudo-recombination Sequences 
The following example describes the identification of 
pseudo-loxP sequences by computer search. Similar proce- 
dures can be used to identify other pseudo-recombination 
sequences. 

The findpatterns algorithm of the Wisconsin Software 
Package Version 9.0 developed by the Genetics Computer 
Group (GCG; Madison, Wis.), was used to screen all $ 
sequences in the GenBank database (Benson et al., 1998, 
Nucleic Acids Res. 26, 1-7). Default parameters are given 
below. Patterns resembling the wild-type loxP sequence, 
called pseudo-loxP sites (iplox) herein, were sought. The 
results from two different search strategies (Patterns #1 and 4Q 
#2, see below) were pooled. 

The wild-type loxP site is 34 base pairs long and consists 
of two identical thirteen-basepair palindromes, separated by 
an eight-basepair core. It has been demonstrated that, while 
strand cutting and exchange take place in the eight-basepair 45 
core, the DNA sequence of most of this core is not critical, 
as long as it matches between the two sites that are to 
recombine (Hoess et al., 1986, Nucleic Acids Res. 14, 
2287-2300; Sauer, 1996, Nucleic Acids Res. 24, 
4608-4613). Therefore, most of these bases were set as n's 50 
in the search algorithm. Nucleic acid constructs created 
using the principles embodied in the invention allow for full 
control over the sequence of the incoming lox site, as its 
eight-basepair core can be made to match that of the 
genomic site being targeted. This feature of the recombina- 55 
tion reaction gives the desired level of specificity, allowing 
targeting of only one iJjIox site in the genome. 

Previous studies have suggested that the central bases of 
the thirteen-basepair palindrome, those closest to the eight- 
basepair core, are important for Cre recognition. Therefore, 60 
greater weight was given to matching the inner four or five 
positions of the palindrome. 

Using search Pattern #1, a search was constructed in such 
a way that the sequences returned by the search program 
would only look for resemblance in the thirteen-basepair 65 
palindromic regions of the loxP site. The sequence entered 
into the search algorithm is shown below: 
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Pattern #1: ATAACTTCGTATA (n) {8} TATACGAAGT- 

TAT (SEQ ID NO:4). 
The (n) {8} allows the program to substitute any eight 
nucleotides in the region between the two thirteen-basepair 
inverted repeats and only look for similarity to the thirteen- 
basepair inverted repeats. Both strands were searched and no 
gaps or extensions were allowed. 

When the search was conducted allowing for a maximum 
of eight mismatches, a large number of hits were obtained in 
the primate database. The total number of sequences 
searched was 73,825, representing 118,684,866 basepairs of 
sequence. The hits obtained from this search were then 
reviewed to identify likely pseudo-loxP candidates. 
Sequences having exact matches of at least four or five 
nucleotides immediately adjacent to the core on each side 
were given preference because mismatches more than five 
nucleotides away from the core on either side may be 
tolerated to some extent by Cre recombinase. A similar 
search was undertaken with the rodent database. 

Search Pattern #2 made use of additional search criteria 
derived from structural studies of Cre. The crystal structure 
at 2.4 angstrom resolution of Cre recombinase complexed 
with loxP DNA reveals that contact is made between Cre and 
its target site at certain bases (Guo et al., 1997, Nature 389, 
40-46). Footprinting with Fe-EDTA using Cre bound to the 
loxP site also reveals points of contact between Cre and 
bases in the loxP site (Hoess et al., 1990, J. Mol. Biol. 216, 
873-882). These bases can be weighted more heavily to 
favor matching with the wild-type site. The search formula 
for determining a fit to these structural criteria was as 
follows for the 34-basepair lox site: 
Pattern #1: ATnACnnCnTATA nnnTAnnn TATAnGnnGT- 

nAT (SEQ ID NO:5). 
Again, both strands were searched and no gaps or extensions 
were allowed. A search demanding four or fewer mis- 
matches with the specified 16 basepairs yielded an extensive 
list of matches with the extant DNA sequences. 

Searches were done in GenBank in the Primate, Rodent, 
Invertebrate, Plant, Fungus, and Bacteria databases. Some of 
the sites identified using these methods are shown in FIGS. 
8Aand 8B. The core sequences are shown in boldface type. 

Example 2 

In vitro Excision Assay of Pseudo-lox Sites in 
Bacteria and Human Cells 

The following example demonstrates that the pseudo- 
recombination sequences of the invention are functional as 
sites for recombination of a nucleic acid sequence by a 
site-specific recombinase. 

A negative control plasmid, pLCGl (FIG. 1A), was 
created by inserting a 4.3-kb Xbal-BspHI fragment contain- 
ing the lacZ gene, encoding P-galactosidase, driven by the 
CM V promoter (from pCMVSPORT-pgal, Gibco/BRL) into 
the EcoRV site of pLitmus29 (New England Biolabs, 
Beverly, Mass.) in the opposite orientation to the LacZa 
gene already present in the plasmid. This plasmid was then 
used as a base for the construction of other plasmids used in 
the excision assay. A very similar negative control plasmid, 
pL2p50, was used in some of the experiments in place of 
pLCGl. Briefly, annealed oligonucleotides containing the 
lox sites being tested and a marker restriction enzyme site 
were directionally cloned into the BamHI-Hindlll sites on 
one side and the BgLII-XhoI sites on the other side of the 
CMV-lacZ construct. This cloning was carried out to ensure 
that Cre-induced site-specific recombination would result in 
excision of the lacZ marker gene. Aschematic representation 
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of the plasmids is shown in FIGS. 1A through 1C. FIG. ID 
shows the DNA sequences of the lox sites from pWTLox 2 
shown in FIG. IB (top line of FIG. ID) and plasmid p 
iploxh7q21 shown in FIG. 1C (bottom lines of FIG. ID). 

The positive control plasmid used in the excision assay 
(pWTLox 2 ' FIG. IB) had the 34-bp wild-type loxP site 
cloned into both the BamHI-Hindlll site and the BgUI-XhoI 
site. The test plasmids had a pseudo-recombination site 
cloned into the BgUI-XhoI site and a recombination site 
containing the 13-bp palindromic repeats of loxP flanking 
the core sequence of the pseudo-recombination sequence 
cloned into the BamHI-Hindlll site. 

The bacterial strain used for the excision assay, 294-Cre 
(Buchholz, et al, Nucleic Acids Research 24:3318-3319, 
1996) has been designed to constitutively express Cre 
recombinase at 37° C. 

Approximately 1 ng of the DNA being tested was elec- 
trotransformed into the 294-Cre strain of E. coli using the 
Bio-Rad Gene Pulser (BioRad Laboratories, CA) at a field 
strength of 12.5 kV/cm, with a capacitance of 25 fiV and 
resistance of 200Q. Aliquots of the transformation mix were 
spread on plates containing ampicillin (100 jug/ml), methi- 
cillin (100 ^g/ml), and X-gal (60 fig/ml). The plates were 
incubated at 37° C. for 18 hours, after which they were 
scored for the presence of blue and white colonies. Bacteria 
containing the parent plasmid pLCGl generated a blue 
bacterial colony when grown on these plates, whereas bac- 
teria containing a plasmid from which lacZ sequence has 
been excised generated a white colony. The excision fre- 
quency was defined as the ratio of the number of white 
colonies to the total number of colonies, expressed as a 
percentage. 

As shown in Table 1 below, the excision frequency was 
close to 100% when the wild-type loxP sequences were 
present on the plasmid (positive control) and no excision 
was observed when no loxP sites were present. 



TABLE 1 




lox Site Efficiency 
Tested (%) 



none 0.00 

loxP 98.9 

ijilox h7q21 11.5 
t|>1ox h7q31 8.9 

x|>lox hXp22 99.0 
t|>lox h5pl5 1.4 
iplox m9 4.0 

i|ilox m5 98.7 



The results above are based on from 4 to 13 separate 
experiments for each plasmid tested. The data indicate that 
pseudo-recombination sequences are functional, and some 
pseudo-recombination sequences (iplox hXp22 and iplox 
m5) promote recombination at very high frequencies, com- 
parable to the wild-type loxP sequence. 

In conjunction with the data of Example 1, these recom- 
bination efficiency results help identify which basepairs 
within loxP are most critical for Cre binding. A strict 
correlation between the number of mismatches and the 
recombination efficiency was not observed. Therefore, it is 
clear that matches at specific positions are more important 
than overall homology. These results are consistent with the 
idea that the four bases flanking the core are important, as 
the iplox h5pl5 site, that has a mismatch in this region while 
otherwise having good matches, had the lowest recombina- 
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tion frequency. The wild-type core sequence was not 
required. For example, ip lox m5, which had a recombination 
frequency indistinguishable from that of loxP, had no 
matches to loxP in the 8-bp core. However, the best sites had 

5 only A and T basepairs in the central two positions of the 
core, indicating that this feature may be important. 

The four iplox sequences identified by using Pattern #2, 
iplox hXp22, iplox h5pl5, iplox m5, and iplox m9, included 
the two iplox sites with the highest excision efficiencies, 

10 iplox hXp22 and iplox m5, indistinguishable from loxP. On 
the other hand, iplox h5pl5, also obtained using Pattern #2, 
had the lowest recombination efficiency of the sites tested, 
probably because it contained a mismatch in the four posi- 
tions nearest the core. These results suggest that while these 

15 first four positions are critical, the requirement for matching 
at the first five positions, used in screening the sites obtained 
with search Pattern #1, was overly restrictive. Good results 
would be obtained by using Pattern #2 in combination with 
a stringent requirement for matching at the first four posi- 

20 tions from the core. 

A similar assay was carried out in mammalian cells. 
Briefly, a plasmid expressing Cre, pBS185 (Life Technolo- 
gies Inc., Grand Island, N.Y.) was modified by the insertion 
of a kanamycin resistance gene into the unique Seal site to 

25 create pBS185-Kan. This modification renders cells trans- 
fected with plasmid resistant to kanamycin but sensitive to 
ampicillin. Approximately 2/tg of plasmid pBS185-Kan and 
50 ng of one of the plasmids used in the bacterial assay 
described above were transfected into 293 (ATCC Accession 

30 No. 1573), human embryonic kidney cells, using Lipo- 
fectAmine (Life Technologies) following the manufactur- 
er's recommendations. The transfected cells were treated 
with DNasel 24 hours after transfection. The cells were 
grown at 37° C. in Dulbecco's Modified Eagle medium 

35 (DMEM) for 72 hours after which low molecular weight 
DNA was isolated from the cells by Hirt extraction (Hirt, J. 
Mo. Biol. 26:365-369, 1967). The plasmid DNA was elec- 
trotransformed into E. coli strain DH10B (Life 
Technologies) under the conditions described above. Ali- 

40 quots of the transformed bacteria were grown on amp/meth/ 
X-gal plates as described above and scored for the presence 
of blue and white colonies. 

Exemplary results are shown in FIG. 2. The frequency of 
excision seen in a mammalian cell background demonstrates 
the predictive nature of the bacterial assay system and 
demonstrates that the pseudo-recombination sequences of 
the invention are active substrates for recombinase-mediated 
recombination in a mammalian cell environment. 

5Q The iplox h7q21 and iplox hXp22 sites may mediate 
integration into the human genome. The iplox h7q21 site is 
located in the q21 region of chromosome 7, while the iplox 
hXp22 site is situated in band p22 of the X chromosome. 
The existence of these sequences in the human genome was 

J5 verified by sequencing the appropriate PCR fragments cov- 
ering the sites from human genomic DNA. Neither site is 
located in a coding sequence or a known gene. 

Example 3 

g0 In vitro Transient Integration Assay of Pseudo-lox 
Sites in Human Cells 
The following example provides a model system for 
assessing the ability of the pseudo-recombination sequences 
of the invention to promote genomic modification by site- 
65 specific insertion. 

The iplox site to be tested was placed on a plasmid having 
tetracycline resistance (FIG. 3, upper left). This plasmid 
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represented the chromosome and was the recipient for chromosome than newly transfected plasmid DNA. One 

integration events. A lox site having the wild-type loxP preferred shuttle vector may have EBNA-1 sequences, the 

palindromes and the 8-bp core of a{>lox h7q21 was placed EBV family of repeats, oriP or a human chromosomal ori, a 

next to the lacZ gene on a second plasmid, this one having bacterial origin of replication, and a pseudo-lox sequence 
ampicillin resistance (FIG. 3, upper right). This plasmid s a nd a marker gene such as one conferring hygromycin 

represented the incoming donor vector. These plasmids were resistance. This vector is established in mammalian cells 

constructed as follows: The plasmid pTMl was generated by antib i ot i c selection. The cells are transfected with a 

cloning a 155 base-pair Afffll-SnaBI fragment from pLit- lasmid expressing Cre and a plasmid having a lox recom . 

mus29 containing the multiple c oning site into a unique Mon nc& and a fflarker such as & 

nr7ofr p f ^ V^Pr eres ™ de " v ^ of a> for chloramphenicol resistance. The assay is performed as 

pUC19 (C. R. Sclimenti and M.P.C., unpublished). The lox described above 
sites of interest were then cloned into the Bglll-Xhol site of 

this plasmid to generate the recipient plasmids for the Example 4 
integration assay (pRWT and pRh7q21). 

The plasmid pLGWTLox 2 was used as a base for the 15 In vitr0 Chromosomal Assay for Integration 

construction of the donor plasmids used in the integration Efficiency 

assay. pLGWTLox 2 was created by treating pWTLox 2 with The following example evaluates the efficiency at which 

EcoRI and subsequent religation to excise the CMV pro- a heterologous nucleic acid sequence can be inserted into a 

moter and create a unique EcoRI site between one of the chromosome at a particular pseudo-recombination site 

loxP sites and the lacZ gene. Complementary oligonucle- 20 (integration efficiency) and the level of expression of a gene 

otides containing the loxP-derived palindromes with the sequence inserted therein. 

core derived from the o|)lox h7q21, a marker enzyme site, Bicistronic assay vectors are constructed containing, for 

and EcoRI half-sites at the ends were annealed and hgated example, a gene coding for hygromycin resistance under the 

into the unique EcoRI site of pLGWTLox to generate the control of the thymidine kinase promoter and a gene encod- 

P Dh7q21 donor plasmid for the transient integration assay. 25 mg the enzyme chloramphenicol acetyl transferase (CAT) 

To perform the assay, 50 ng of the tetracycline-resistant under the control of the cytomegalovirus immediate early 

recipient plasmid and 1 fig of the ampicillin-resistant donor promoter (Wohlgemuth, et al, Gene Therapy 3:503-512, 

plasmid were co-transfected into human 293 cells with 1996). The former marker is used primarily to assess inte- 

. Lipofectamine along with 2 /jg of the Cre expression vector gration frequency while the latter marker is useful for 

pBS185-Kan. The transfected cells were treated with 30 sensitively assaying the level and duration of gene expres- 

DNasel 24 hours after transfection. After 72 hours in human s i on . The vector additionally carries a lox sequence contain- 

cells, plasmid DNA was purified by Hirt extraction (Hirt, J. ; ng tne core Q f tr)e pS eudo-loxP sequence under evaluation. 

Mo, Biol. 26:365-369, 1967) and returned to the DH10B The tes{ plasmid is transfected into mammalian cells, such 

strain of E. coli for detection of integration events. Plasmids M 293S cdls (human) or NIH3T3 cdls (mouse)> al ^ 

that underwent integration were tetracycline resistant and 35 , Cre-expressing plasmid, such as one of those described 

now also carried lacZ (FIG. 3, lower left). They thus gave above ^ transfec ted cells are grown in the presence of 

rise to blue colonies when plated on LB medium containing hygromyciri and the numb er of hygromycin resistant colo- 

tetracychne and X-gal and incubated overnight at 37 C. nies scored ^ a me a sure of integration frequency. A number 

Plasmid DNA was purified from blue colonies, and those of antibiotic resistant cok)mes are propagated and analyzed 

plasmids with the restriction pattern expected for integration 40 by polymerase cha in reaction (PCR) and Southern blotting 

were classified as integrants. Each blue colony was t0 determine whether they have an integration event targeted 

restreaked on LB plates containmg X-gal and either ampi- tQ the correct ^ site CAr gene expression ^ measur ed as 

cilhn and methicillin or tetracycline. One representative follows Cell extracts are prepared by standard procedu res 

plasmid was sequenced m the relevant regions to document and totaJ tein of , he extract ig normalized for total tein 

integration at lox sites. The integration frequency was cal- 45 concentration and assayed for CAT activity as described by 

culated as the number of integrants divided by the total Gorman; e t al, Proc Natl Acad Sci USA 79:6777, 1982 or 

number of tetracycline-resistant colonies. Wohlgemuth, supra. 

The integration assay was performed with recipients 

bearing the aplox h7q21 site or controls having either the Example 5 

wild-type loxP site or no lox site, along with the correspond- 50 „ T 

ing donors. The integration frequency at the wild-type loxP In vlvo ***** for ^ration 

site was 0.41%. Integration at the iplox h7q21 site was The following assay evaluates the ability of a recombi- 

readily detectable and occurred at a frequency of 0.12%. nation sequence to promote integration of a heterologous 

Experiments performed with either the recipient alone or the nucleic acid sequence into a genome in vivo, 

donor alone in the presence or absence of the Cre expression 55 The in vivo integration and expression of the CAT gene by 

plasmid did not yield any integrants. Transfection of the employing the teaching of the invention is evaluated essen- 

recipient and the donor in the absence of the Cre expression tially as described by Zhu, et al, Science 261:209-211, 1993. 

plasmid also failed to yield any integrants. These results Vectors, one containing a lox recombination sequence and 

demonstrate that detectable site-specific integration occurs CAT gene and one expressing Cre, are mixed with liposomes 

at a pseudo-lox site in the human cell environment. 60 that have a net cationic charge, for example, containing 

A second type of shuttle vector system that can be used to N[l-(2,3-dioleyloxyl)propyl]-N,N,N-trimethylammonium 

model chromosomal integration utilizes modified autono- chloride (DOTMA) (Feigner, et al, Proc Natl Acad Sci USA 

mously replicating vectors such as those described in issued 84:7413, 1987) and dioleoyl phosphatidylethanolamine 

U.S. Pat. No. 5,707,830. These types of vectors replicate (DOPE) in a 1:1 ratio. The ratio of DNA to liposomes is 

stably in human cells and have a very low endogenous 65 typically 1:1. The liposome/DNA mixture is typically 

mutation frequency (DuBridge, et al, Mol. Cell. Biol. injected into test mice in 200 [A of 5% dextrose in water 

7:379-387, 1987). Thus, they provide better models for the intravenously through the tail vein. 
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At various time points, starting at 24 hours post-injection, GACGGACACACCGAA3' (SEQ ID NO: 12), 5'GTAC- 

test mice are sacrificed and various tissues harvested and TAGTCGCGCTCGCGCGACTGACG3' (SEQ ID NO:13) 
homogenized. Cleared homogenates are assayed for CAT and ligated into linear pCR2.1 at the T overhang sites to 

enzyme activity using a scintillation counting assay (Seed create the plasmid pTA-attP, containing a 221 bp attP region, 

and Sheen, Gene 67:271-277, 1988) with the following 5 The lacZa was removed from pBCSK+ by digestion with 

modifications: 0.3 /iCi of 14 C-labeled chloramphenicol (55 Pvul and Kpnl, treatment with T4 polymerase, and religa- 

mCi/mmol) is added to 200 nmol of acetyl coenzyme A for tion. The full length lacZ gene from pCMVSPORTBGal was 

a final volume of 122 /A. CAT activity is expressed as either removed by digestion with Spel and Hindlll and cloned into 

CAT enzyme/weight of tissue or as a function of milligrams the Spel and Hindlll sites of the lacZa deficient PBCSK+ to 

of protein in each tissue extract. Tissue extracts are prepared 10 make pBCfSGal. The attP was then removed from pTA-attP 

by standard procedures and total protein determined using by Spel digestion and cloned into the Spel site of pBC|3Gal. 

standard protocols (Bradford, Lowrie, and the like). The attB was then removed from pTA-attB by Sail digestion 

and cloned into the Sail site of the attP containing pBCpGal, 
Example 6 to create the assay plasmid PBCPB+ (FIG. 4C), in which the 
is TTG cores of the att sites are in the same orientation. In 
addition, a control plasmid, PBCPB-, in which the att sites 
were in opposite orientations, was also constructed. 
The following example describes a rapid assay to measure The pl nt plasmid was then transformed into DH10B 
site-specific integration by a recombinase. This assay was bacteria, grown under kanamycin selection, and made elec- 
used to measure integration of the wild-type <|>C31 attB 20 trocompetent by a standard protocol. The resulting electro- 
sequence into the wild-type <|>C31 attP sequence in the competent DHInt cells were used in the bacterial intramo- 
presence of the <^C31 integrase. A similar assay can be used lecular integration assay, conducted as follows. 200 ng of the 
measure integration mediated by other recombinases of assay plasmid of choice was electroporated into DHInt cells, 
interest, such as the integrases of phages R4 and TP-901. allowed to recover for one hour, spread on plates containing 
Integrase-expressing plasmids were constructed as fol- 25 chloramphenicol and Xgal, and grown at 37° C. If an 
lows. The 4,C31 integrase gene was amplified by the poly- intramolecular integraUon event occurs the lacZ gene 
merase chain reaction from the plasmid pIJ8600 containing located betwe , en the »"? and attP ^ te f ™» be excised, and 
the 4.C31 integrase and attP (M. Bibb, John Innes Institute, f re f Wl11 b u e wblte - ™ e ^quency of mtramo- 
Norwich, U.K.) with the following primers: 5'GAAC- „ k^ar xntegration was therefore calculated as the number of 
TAGTCGTAGGGTCGCCGACATGACAC3' (SEQ ID whlte colonies dlvlded b y the total number of colonies. 
NO:6) and 5'GTGGATCCGGGTGTCTCGCTACGCCGC- When this assay was carried out in DHInt bacteria using 
TAC3' (SEQ ID NO:7). The PCR product was ligated into pBCPB+, all colonies were white, indicating efficient inte- 
linear pCR2.1 (Invitrogen, Carlsbad, Calif.) at the T over- gration. Thousands of colonies were assayed for each plas- 
hang to make the plasmid pTA-Int. The lacZ gene was ' mid tested - The same plasmid produced only blue colonies 
removed from pCMVSPORTfiGal (Life Technologies, >n DH10B bacteria, in the absence of the integrase gene. 
Grand Island, N.Y.) by digestion with the restriction These results verify that the assay plasmid carried functional 
enzymes BamHI and Spel, and replaced by the integrase attB and attP sites and that the <)>C31 integrase functioned 
gene from pTA-Int with BamHI and Spel compatible ends, efficiently in£. coli with no added co-factors. In contrast, the 
creating the plasmid, pCMVInt (FIG. 4B), which expresses plasmid pBCPB-, which carried the att sites in inverted 
<j)C31 integrase in mammalian cells under control of the orientation, resulted in blue colonies, because the lacZ gene 
cytomegalovirus immediate early promoter. was merely inverted, not excised, by the integration reac- 
The integrase gene was subsequently removed from tion " Tbe assay plasmid with no att sites, pBCSK-pgal, also 
pCMVSPORTInt by digestion with BamHI and PstI and y ielded ^ blue , co onies J ™ J^J*^ Miction 
ligated into pACYC 177 (resistances ampicillin and 45 enzyme digestion of pksmid DNA purified from a repre- 
kanamycin) (S. Cohen, Stanford University, Stanford, sentative number of white colonies verified that the intramo- 
Calif.) that had also been treated with BamHI and PstI, lecu , lar J roteg"tion reaction occurred as expected and 
removing part of the ampicillin resistance gene. Finally, the resulted in deletl0n of lacZ between the attB and attP sltes - 
lacZ promoter was removed from PBCSK+ (Stratagene, La Exam le 7 
Jolla, Calif.) by digestion with SacI and Sapl. The integrase- 50 amp 6 
containing pACYC plasmid was digested with PstI andSacI, intramolecular Integration Assay in Mammalian 
and the lacZ promoter was inserted upstream of the integrase Cells 
gene with a linker (5'GCTCGGCCAAAAAGGCCTGCA3' 

(SEQ ID NO:8), 5'GGCCTTTTTGGCCG3' (SEQ ID The following example demonstrates the ability of phage 

NO:9), creating the plasmid, pint (FIG. 4A), expressing the 55 <j>C31 integrase to integrate sequences site-specifically and 

(j)C31 integrase under control of the lacZ promoter. efficiently in a mammalian cell environment. 

The intramolecular integration assay plasmid was con- To perform the intramolecular integration assay in human 
structed as follows. The bacterial attachment site for <j>C31 cells, the same PBCBP+ plasmid was used as in the bacterial 
(attB) was amplified by PCR from Streptomyces lividans assay of Example 6. The pCMVInt plasmid was substituted 
genomic DNA (S. Cohen, Stanford University, Stanford, 60 for pint to ensure expression of (|>C31 integrase in mamma- 
Calif.) with the primers: 5'CAGGTACCGTCGACGATG- Kan cells. Subconfluent (60-80%) 60 mm plates of human 
TAGGTCACGGTC3' (SEQ ID NO: 10) and 5'GTCGACAT- 293 cells grown in DMEM supplemented with 9% fetal 
GCCCGCCGTGACCG3' (SEQ ID NO:ll). This attB frag- bovine serum and 1% penicillin/streptomycin were trans- 
ment was ligated into linear pCR2.1 at the T overhang sites fected with lipofectamine (Life Technologies) at a ratio of 6 
to create the plasmid pTA-attB containing a 285 bp attB 65 fig lipofectamine per fig of DNA. Experiments were per- 
region. The phage attachment site (attP) was amplified by formed with 100 ng of the assay plasmid of interest and 2f4g 
PCR from pIJ8600 with the primers 5'CGACTAGTACT- of pCMVInt. Controls performed in each experiment 
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included no DNA, pCMVInt only, pBCSK-(3gal (assay Hindlll sites was removed. This fragment was replaced by 

plasmid with no att sites), pBCSK-figal+pCMVInt, and the series of synthetic shorter sites having ends permitting 

PBCPB+ alone. their orientation-appropriate cloning into pBCBP+. The 

Twenty-four hours after transfection, the medium was resulting plasmids were electroporated into DHInt E. coli 

supplemented with 50 Units/ml of DNasel to reduce the 5 cells and recombinants were scored as white colonies, as 

background of untransfected DNA. Three days after described in Example 6 above. FIG. 5 (left side) shows the 

transfection, the cells were harvested and low molecular results of these experiments. AttB sites of 50, 40, 35, and 34 

weight DNA was recovered by using the Hirt procedure basepairs all provided full recombination function, i.e. they 
(Hirt, J. Mo. Biol. 26:365-369, 1967). Aportion of this DNA functioned at 100% of the efficiency of the full-length attB. 

was electroporated into competent DH10B E. coli cells and 1° Reduction of the site to 33 basepairs produced a marked 

spread on plates containing chloramphenicol and Xgal to decrease in recombination activity. Therefore, 34 basepairs 

select only for the assay plasmid. The intramolecular inte- was determined to be the minimal function size of attB. 

gration frequency was determined to be the number of white Once attB was determined to be 34 basepairs long, attP 

colonies divided by the total number of colonies. was subjected to a similar set of reductions. The reduced attP 

Using this assay system in mammalian cells, the (|)C31 15 sites were assayed on a plasmid carrying attB34 rather than 

integrase was shown to catalyze recombination between the full-length attB. To perform these experiments, the full- 
full-length attB and attP sites of PBCBP+ at a frequency of len § th attP surrounded by SacII and Spel sites was replaced 

50.6% (mean of 16 experiments, standard error=2.32%). with a series of synthetic annealed oligonucleotides bearing 

This frequency is likely to be an underestimate as plasmid ends permitting their correct orientation-specific cloning 

DNA that never came in contact with the ((.C31 integrase was 20 mt0 pBCPB+-attB34. FIG. 5 (right side) depicts the results 

probably present, despite efforts to remove untransfected of these experiments. The function of attP dropped off as its 

DNA with DNasel. It is clear that the (j>C31 integrase size was reduced from 40 to 36 basepairs. The DNA 

catalyzes efficient site-specific integration in mammalian sequence revealed that the 38 basepair site encompassed the 

cells. major inverted repeat evident in attP. However, it was 

To verify site-specific recombination, 96 white colonies 25 

were picked and plasmid DNA was prepared and examined conve >"; d ^function (P39A&B) From this analysis, the 

by restriction digestion. Of these, 97% contained a plasmid mmimal size of attP was dete nmned to be 39 basepairs. 

that represented the expected site-specific recombinant. The To determine the frequency at which the reduced att sites 

remaining colonies contained plasmids that carried large function in mammalian cells, the same panel of plasmids 

rearrangements that disrupted lacZ. The low frequency was a n a lyzed by using the intramolecular integration assay 

rearrangement of transfected plasmids was observed with all described in Example 7. Each of the assay plasmids was 

plasmids, with and without integrase and att sites, and can be transfected into human 293 cells along with pCMVInt. After 

attributed to transfection-associated mutation of newly 72 hours in the mammalian cells, the plasmid DNA was 

introduced DNA purified by the method of Hirt (Hirt, J. Mo. Biol. 

35 26:365-369, 1967) and transformed into DH10B E. coli 

Example 8 cells for scoring of recombinants. The results of these 
experiments showed that minimal sizes for attB and attP 

Determination of the Minimal Sizes of similar to those determined in E. coli also applied in 

Recombination Sequences mammalian cells. Approximately 60-90% of the efficiency 

The following example describes the process for deter- 4 ° of the full-length att sites was achieved with the same 

mining the minimal sequences needed for recognition and reduc f d f prices that worked at 100% efficiency in E. 

recombination by a site-specific recombinase. This process co l 1 '. llkel y because the overaU reactI0n 15 somewhat less 

was used to determine the minimal wild-type attB and attP efficient m the mammalian cell environment, 

sequences functionally recognized by the <j>C31 integrase in 45 experiments to determine the minimal sizes of attB 

bacterial and mammalian cell environments. A similar pro- and a »P provided the information that these recombination 

cess can be used to identify the minimal sequences recog- siies had sizes of 34 and 39 basepairs, respectively. These 

nized by other recombinases of interest, such as the inte- size s are similar to that of the 34-basepair loxP site. A 

grases of phages R4 and TP-901. The minimal attB and attP recombination site of this size will possess active pseudo 

sequences can then be used to identify pseudo- 50 recombination sites in large genomes, such as those of 

recombination sequences, for example as described above mammals and most plants. Thus, it is statistically expected 

for the Cre-lox system. that the pseudo recombination sites for the (|>C31 integrase 

Prior to this study,' the minimal sizes for the +C31 wil1 occur in these genomes. These pseudo recombination 

attachment sites, attB and attP, had not been determined. The sltes re P resent tar § ets for chromosome engineering. 

attB site had been localized to approximately 280 basepairs 55 Example 9 
and the attP region had been localized to 86 basepairs 

(Thorpe and Smith, Proc. Natl. Acad. Sci. USA, 1998). The Determination of the Amount of Heterogeneity 

intramolecular integration assay described in Example 6 was Tolerated in the Core Sequence of a Recombinase 

used to determine the minimal functional sizes for these att s ^ te 

sites. Short double-stranded adaptor molecules containing go The amount of heterogeneity tolerated in the 3-bp core 

att sites of various lengths were created by annealing single- sequence of the attB and attP sequences recognized by the 

stranded oligonucleotides. These shorter sites were used to <|)C31 integrase was determined. Similar methods can be 

replace the full-length att sites in the pBCPB+ assay used to determine the amount of core heterogeneity tolerated 

plasmid, and recombination efficiencies were determined by in the cores of other recombinases of interest, such as the 

electroporation into E. coli. 65 integrases of phages R4 and TP-901. 

To determine the minimal function size of attB, the The (j>C31 integrase catalyzes recombination between attB 

278-basepair full-length attB surrounded by BamHI and and attP sites. These sites have minimal functional lengths of 
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34 and 39 basepairs, respectively. While largely distinct in Example 7 demonstrated that the <j>C31 integrase effi- 

sequence, attB and attP share a three basepair common core ciently catalyzed site-specific intramolecular integration in 

sequence, TTG, that includes the crossover region. In the mammalian cells. The next step was to show that the 

case of the 8-basepair core region of the loxP site targeted by integrase could catalyze efficient site-specific integration of 

Cre recombinase, it has been found that its sequence is 5 exogenous DNA into mammalian chromosomes in cell 

largely unimportant, as long as it matches between the two culture. EBV-based plasmids provide easy and useful mod- 

recombining sites. To determine if this behavior applied to els for chromosomes. EBV vectors exist in the nucleus, 

the core region of the attB and attP sites of the <j>C31 replicate in synchrony with the chromosomes, and bear 

integrase, the effects of mutations within this core region chromatin indistinguishable from that of the chromosomes, 

were examined. 1Q They can be easily purified from cells and transformed into 

A panel of plasmids was generated in which either attB, E. coli for rapid scoring of integration events. Thus they 

attP, or both sites were altered with a specific single base have great utility in characterization of the integration 

change. These changes were then assayed with the intramo- reaction in human cells. 

lecular integration assay in E. coli described in Example 6. j n these experiments, a kanamycin-resistant EBV plasmid 

A recombination event results in excision of the lacZ gene jg was equipped with an attB site and established in human 293 

located between the att sites. Thus, when an assay plasmid cells to create a stable attB-containing human cell line. An 

is transformed into bacteria expressing 4>C31 integrase, a ampicillin-resistant plasmid carrying attP and lacZ was then 

site-specific recombination event is scored as a white colony. co-transfected into the attB cell line, along with a plasmid 

The TTG core was mutated in each position individually expressing the <(>C31 integrase. To assay for integration 

to all other base possibilities. The effects of these mutations 2Q products, after three days plasmid DNA was extracted and 

in attB were investigated when paired with a wild-type attP. transformed into bacteria. Blue colonies that grew on plates 

Conversely, the effects of a mutant attP paired with a containing kanamycin, ampicillin, and Xgal were scored 

wild-type attB were measured. By combining attB and attP integrants, while total colony number could be obtained by 

sites that contained identical mutations, it was determined plating on kanamycin alone. 

whether the core region needed to only match to be effective 25 The attB and attP plasmids needed for this study were 

in recombination. constructed as follows. The target EBV based plasmids were 

To carry out these experiments, oligonucleotides bearing based on p220.2 (DuBridge et al, 1987). The control plasmid 

the mutations to be tested were synthesized in the context of p220K was made by inserting the kanamycin resistance gene 

attB34 or attP40 (see Example 8). The mutant oligonucle- from the Kan-resistant Genblock (Amersham Pharmacia, 

otides were annealed and cloned into the chloramphenicol- 30 Piscataway, N.J.) into the XmnI site of the ampicillin 

resistant intramolecular integration assay vector pBCBP+ to resistance gene of p220.2. To make attB-containing p220 

replace the wild-type attB or attP, as in Example 8. Indi- plasmids, the ampicillin-resistance gene of p220.2 was 

vidual plasmids containing the mutation of interest were removed by digestion with BspHI. The kanamycin resis- 

assayed lor recombination in E. coli strain DHInt, which tance gene described above was isolated by digestion with 

carries the kanamycin-resistant integrase expression plasmid 35 PstI, and cloned into amp-p220.2 with BspHI-PstI linkers 

pint, described in Example 6. Assay plasmid DNA (2 ng) (5'CATGAG GCCAA AAAGGCCTGCA3' (SEQ ID NO:14) 

was electroporated into DHInt, and after a 1 hour recovery and 5'GGCCTTTTTGGCCT3' (SEQ ID NO: 15) to create 

period at 37° C. in rich media, the transformations were the plasmid p220K. The full length attB was removed from 

plated on LB agar containing 25 mg/ml chloramphenicol, 60 the plasmid pTA-attB (Example 6) by Sail digestion and 

mg/ml kanamycin, and 50 mg/ml X-gal. The plates were 40 cloned into the Sail site of p220K, creating the plasmid 

incubated overnight (16-18 hours) at 37° C., after which p220KattBfull (FIG. 4D). The 35 base pair attB was cloned 

blue and white colonies were counted. The recombination into the Sail and BamHI sites of p220K by using the 

fraction was expressed as the percentage of white colonies oligonucleotides, 5' gatccgatatcgcgcccggggagc- 

out of total colonies. The results of these experiments are ccaagggcacgccctggcaccg 3' (SEQ ID NO: 16) and 5'tcgacg- 

shown in FIG. 6. 45 gtgccagggcgtgcccttgggctccccgggcgcgatatcg3' (SEQ ID 

The first and third positions of the core showed some NO:17), creating the plasmid p220KattB35. 
flexibility, while the center position did not. The first posi- These EBV plasmids, p220K, p220KattBfull, and 
tion appeared to tolerate only pyrimidines; the CTG double p220KattB35, were established in human 293 cells as fol- 
mutant worked well. The third position of attP could be lows. 293 cells were grown in DMEM containing 9% fetal 
changed to any base, and to the other purine for attB. 50 bovine serum and 1% penicillin/streptomycin to -70% con- 
Overall, the pattern of base substitutions tolerated in the fluency in a 100 mm plates. 8 fig of p220KattBfull, 
recognition sites for the <|>C31 integrase more closely p 220Kattb35, or the control p220K were introduced by 
resembled the degree of tolerance for substitutions typical of transfection with lipofectamine according to the manufac- 
the outer palindromes, rather than the core, of the loxP site. turer's protocol. At 24 hours post-transfection, the cells were 
Thus, unlike the situation in the Cre-loxP system, the (|)C31 55 split 1:4, and at 48 hours post-transfection hygromycin 
integrase has strong base preferences within the cores of its selection (350 figlmY) was begun. 11 to 14 days after starting 
attB and attP recombination sites, and merely matching any selection the cells were expanded and frozen down, 
two three-basepair core sequences will not suffice to gener- The attP-containing plasmid pTSAD (FIG. 4E) was con- 
ate efficient recombination in this system. structed as follows. A multiple cloning site (oligos: 5'AAT- 
Example 10 60 TACCGCGGGGCGCGCCGTTTAAACGCAT- 
GCCAATTGGGCCGGCCG3 1 (SEQ ID NO: 18) and 
Bimolecular Integration Assay into a Model 5'AATTCGGCCGGCCCAATTGGCATGCGTT- 
Chromosome in Mammalian Cells TAAACGGCGCGCCCCGCGGT3' (SEQ ID NO: 19) was 
The following example demonstrates the ability of phage cloned into the EcoRI site of the plasmid pWTLox 2 
4>C31 integrase to integrate sequences site-specifically and 65 (Example 2) upstream of lacZ, regenerating one EcoRI site, 
efficiently into a model chromosome in a mammalian cell The attP site was removed from the plasmid pTAattP 

(Example 6) by digestion with EcoRI and cloned into the 
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regenerated EcoRI site of pWTLox 2 to create the plasmid a minimally sized attB, a significant number of blue colonies 
pESl. The lacZ promoter was removed from pBCSK+ by were detected. When corrected for the transfection efficiency 
digestion with PvuII and SacII and cloned into pESl which in these experiments, the integration frequency was 1.7%. 
had been digested with Pmel and SacII. The region contain- For p220KattBfull, the integration frequency was even 
ing attP, the lacZ promoter, and the lacZ gene was removed 5 higher, at 7.5%. This increase presumably reflects a favor- 
by digestion with BamHI and Bglll and cloned into the ab i e sequence context for the full attB site compared to the 
BamHI site of pTSA30 (Gregory Phillips, Iowa State reduced site. Controls in which pCMVInt, pTSAD, and each 
University, Ames, Iowa) to create the donor plasmid of the EBV plasmids , p22 0K, P 220KattBfull, and 
pTSAD. pTSA30 and its pTSAD derivative are temperature p22 0KattB35 were co-transformed directly into E. coli 
sensitive for plasmid replication in E. coll. M yidded negligible number s of blue colonies (0.002% or 
To perform the integration assay, EBV plasmid- less) controls confirmed that the high frequency 
containing cells were grown to confluency m DMEM con- integration events scored above occurred in human cells, not 
taining 9% fetal bovine serum, 1% penicillin/streptomycin, j n £ co ^ 

and 200/ig/tal hygromycin in 10 cm plates. These plates ^ inte tion frequency int0 an attB site located on an 

were split into eight 60 mm plates and grown in the above 15 ggy lasmid is impressively high and several orders of 

medium without hygromycin for 24-48 hours, until they M than ^ f cies of random integration 

were approximately 60-80% confluent. pCMVInt (Example Qr homol s recombination, highlighting the utility of this 

7, FIG. 4B) and pTSAE were transfected m equimolar Fllrthermore , the integrants are site . specific , as 

amounts (10 £ total DNA) using 50 ^ Superfect (Qiagen, Seated b restriction m of more than 160 of the 

Valencia Calif.) according to the manufacturer s protocol. 20 Wue colonies frQm ^ eriments with p220K attB35 and 

As controls no DNA, 4 m pCMVInt or 6 m pTSAD were 220KattBfull . In addition , two integrants each, from the 

cotransfected with salmon sperm DNA (to 10 y%\ In ; xperiments with p220K att35 and P 220KattfuU, were ana- 

addition an equimolar amount of a plasmid encoding the a( ^ DNA ^ ^ acrQSs ^ . ^ q{ ^ 

green fluorescent protein (a derivative of pEGFP-cl integration site , confirming that exact site-specific integra- 

Clonetech, Palo Alto, Calif.) with salmon sperm DNA. to 10 2S ^ occurred between attB and a[tR nG ? indicates ^ as 

pg was transfected in parallel into the EBV plasmid- cted the reaction requires the presence of both the 

containing cells to monitor transfection efficiency. integrage (pCM V i n t) and the attP target site (pTSAD). 

2.5-3 hours after transfection, the Superfect was removed BeC ause EBV vectors are nuclear, chromatinized mini- 

from the cells and replaced with serum-containing medium. chromosomes, the high integration frequency obtained in 

Cells were fed with medium containing serum and 50 U/ml 30 thlg system is predictive of the expected integration frequen- 

24 hours after transfection and harvested 72 hours after cies into att si(es located on the chr omosomes. 
transfection. Low molecular weight DNA was purified by 

Hirt extraction (Hirt, J. Mo. Biol. 26:365-369, 1967) and Example 11 

transformed into DH10B E. coli by electroporation. Also, 24 for ^ n ^ ^ chromosomes of 

hours after transfection, transfection efficiency was mea- 35 ' Mammalian Cells 

sured by counting the green fluorescent protein-expressing 

cells relative to the total number of cells. The transfection The following example describes methods used to dem- 

efficiencies typically ranged from 6-18%. Because untrans- onstrate the ability of phage <j>C31 integrase to site- 

fected cells would have no opportunity to undergo integra- specifically integrate sequences into mammalian chromo- 

tion but would still contribute EBV plasmids to the bacterial 40 somes. 

assay in the form of white colonies, the transfection effi- Cell lines carrying the wild-type <j>C31 attB site are 

ciency was needed to obtain the correct the integration prepared by transfecting human 293 cells with Lipo- 

frequency. fectamine and a plasmid carrying the atfLB sequence and the 

In a typical experiment, 15 fA of a transformation was hygromycin resistance gene. The cells are grown in DMEM 

spread on each of three plates containing kanamycin, Xgal, 45 containing hygromycin and resistant colonies propagated to 

and IPTG, while 150 jul of the same transformation was mass culture. Integration of the attB sequence is verified by 

spread on each of three plates containing ampicillin, Southern blot analysis using plasmid sequences as probes, 

kanamycin, Xgal, and IPTG. The bacteria were grown These cell lines are then transfected with Lipofectamine and 

overnight at 42° C. for approximately 16 h. The elevated a plasmid containing the attP sequence and a neomycin/ 

temperature prevented replication of pTSAD, which has a 50 G418 resistance gene and a plasmid expressing the <j>C31 

temperature-sensitive plasmid origin of replication. Inte- integrase gene under control of the CMV promoter. The 

grants were scored as the blue colonies on the plates G418 antibiotic is added to the DMEM growth medium 

containing both kanamycin and ampicillin. Integration fre- approximately 48 hours after transfection. Selection is main- 

quency was calculated as the number of blue colonies on tained for approximately ten days, after which the number of 

kanamycin and ampicillin plates divided by the total number 55 colonies is scored. 

of colonies on kanamycin platesxlO for each set of trans- Higher numbers of neomycin resistant colonies are seen 

fections. Raw numbers for integration frequency were in cells co-transfected with the ((>C31 integrase-expressing 

divided by transfection efficiency to obtain accurate values plasmid than in cells that do not receive the integrase. 

for integration frequency. Likewise, higher numbers of neomycin-resistant colonies 

FIG. 7 lists the integration frequencies obtained with each 60 are obtained in cells lines carrying attB compared to the 

of the EBV plasmids and the negative controls. Each line of parent 293 cell line lacking attB. These results suggest that 

the figure represents a minimum of three separate transfec- the (|>C31 integrase enzyme can catalyze the integration of 

lions. For p220K, which lacks the attB site, a negligible heterologous sequences into a mammalian genome, both at 

frequency of blue colonies was detected. Upon analysis, an integrated attB sequence and at endogenous pseudo- 

these plasmids were not integrants, but rather homologous 65 recombination sequences. 

recombination events that occurred through common amp Similar experiments can be conducted using cell lines 

sequences on the two plasmids. For p220KattB35, carrying carrying an integrated attP hygromycin-resistant plasmid, 
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followed by transfection with a neomycin-resistant attB 
plasmid, to demonstrate integration into the integrated wild- 
type attP and attP pseudo-sites. Furthermore, similar experi- 
ments can be conducted in other cell types, such as those 
derived from other mammalian species or from plants, to test 
integration activity in these cellular backgrounds. 
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While the foregoing has been with reference to particular 
embodiments of the invention, it will be appreciated by 
those skilled in the art that changes in these embodiments 
may be made without departing from the principles and 
spirit of the invention, the scope of which is denned by the 
appended claims. 



SEQUENCE LISTING 



<160> NUMBER OF SEQ ID NOS : 41 

<210> SEQ ID NO 1 
<211> LENGTH: 34 
<212> TYPE: DNA 

<213> ORGANISM: Bacteriophage PI 
<400> SEQUENCE: 1 

ataacttcgt atagcataca ttatacgaag ttat 



<210> SEQ ID NO 2 
<211> LENGTH: 34 
<212> TYPE: DNA 

<213> ORGANISM: Saocharomyoes cerevisiae 
<400> SEQUENCE: 2 

gaagttccta tacttctaga agaataggaa cttc 



<210> SEQ ID NO 3 
<211> LENGTH: 12 



<400> 
gaagcagtgg ta 



<210> SEQ ID NO 4 
<211> LENGTH: 34 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: loxP search pattern 
<220> FEATURE : 

<221> NAME/KEY: misc_f eature 

<222> LOCATION: (1)...(34) 

<223> OTHER INFORMATION: n = A,T,C or G 

<400> SEQUENCE : 4 

ataacttcgt atannnnnnn ntatacgaag ttat 34 



<210> SEQ ID NO 5 
<211> LENGTH : 34 
<212> TYPE : DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE : 

<223> OTHER INFORMATION: loxP search pattern 
<220> FEATURE : 

<221> NAME/KEY: misc_feature 

<222> LOCATION: (1)...(34) 

<223> OTHER INFORMATION: n = A,T,C or G 

<400> SEQUENCE: 5 

atnacnncnt atannntann ntatangnng tnat 34 



<210> SEQ ID NO 6 
<211> LENGTH: 30 
<212> TYPE: DNA 
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-continued 



: Artificial Sequence 
<223> OTHER INFORMATION: Description of Artificial Sequence: primer 
<400> SEQUENCE: 6 
gaactagtcg tagggtcgcc gacatgacac 

<210> SEQ ID NO 7 
<211> LENGTH: 30 

<213> ORGANISM: Artificial Sequence 

<223> OTHER INFORMATION: Description of Artificial Sequence: primer 

<400> SEQUENCE: 7 

gtggatccgg gtgtctcgct acgccgctac 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence: linker 
<400> SEQUENCE : 8 
gctcggccaa aaaggcctgc a 

<210> SEQ ID NO 9 
<211> LENGTH: 14 
<212> TYPE : DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE : 

<223> OTHER INFORMATION: Description of Artificial Sequence: linker 
<4 00> SEQUENCE: 9 
ggcctttttg gccg 



<2I0> SEQ ID NO 10 
<211> LENGTH: 30 
<212> TYPE : DNA 

>M: Artificial Sequence 



<223> OTHER INFORMATION: Description of Artificial Sequence: primer 

<400> SEQUENCE: 10 

caggtaccgt cgacgatgta ggtcacggtc 

<210> SEQ ID NO 11 
<211> LENGTH: 22 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence: primer 
<400> SEQUENCE : 11 
gtogacatgc ccgccgtgac eg 

<210> SEQ ID NO 12 
<211> LENGTH: 26 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence: primer 
<400> SEQUENCE: 12 
cgactagtac tgaeggacac accgaa 
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-continued 

<211> LENGTH : 27 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<223> OTHER INFORMATION: Description of Artificial Sequence: primer 
<400> SEQUENCE: 13 

gtactagtcg cgctcgcgcg actgacg 27 



<211> LENGTH: 22 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence: linker 
<400> SEQUENCE: 14 



<211> LENGTH: 14 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence: linker 
<4 00> SEQUENCE: 15 



<210> SEQ ID NO 16 
<211> LENGTH: 46 
<212> TYPE : DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence: P 220KattB35 
first oligonucleotide 

<400> SEQUENCE: 16 

gatccgatat cgcgcccggg gagcccaagg gcacgccctg gcaccg 4 6 



<211> LENGTH: 46 
<212> TYPE : DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence: p220KattB35 
second oligonucleotide 

<400> SEQUENCE: 17 

tcgacggtgc cagggcgtgc ccttgggctc cccgggcgcg atatcg 46 



<210> SEQ ID NO 18 
<212> TYPE : DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence: multiple 
cloning site first oligonucleotide 

<4 00> SEQUENCE: 18 

aattaccgcg gggcgcgcog tttaaacgca tgccaattgg gccggccg 48 



<210> SEQ ID NO 19 
<211> LENGTH: 48 

<213> ORGANISM: Artificial Sequence 



49 
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<223> OTHER INFORMATION: Description of Artificial Sequence: multiple 
cloning site second oligonucleotide 

<400> SEQUENCE: 19 

aattcggccg gcccaattgg catgcgttta aacggcgcgc cccgcggt 48 



<210> SEQ ID NO 20 
<211> LENGTH: 34 
<212> TYPE : DNA 

<213> ORGANISM: Artificial Sequence 

<223> OTHER INFORMATION: Description of Artificial Sequence: wild-type 
loxP site 

<400> SEQUENCE: 20 

ataacttcgt ataatgtatg ctatacgaag ttat ; 



<210> SEQ ID NO 21 

<211> LENGTH: 34 

<212> TYPE : DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence: "psi" 
loxh7q21 

<4 00> SEQUENCE: 21 



<211> LENGTH: 34 
<212> TYPE : DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence:" psi" 
coreh7q21 

<400> SEQUENCE: 22 

ataacttcgt atatatgtat atatacgaag ttat 34 



<211> LENGTH: 34 
<212> TYPE: DNA 

<400> SEQUENCE: 23 

acaaccattt ataatatata atatatgatg ttat 34 



<210> SEQ ID NO 24 

<211> LENGTH: 34 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 24 

atacatacgt atatatgtat atatacatat atat 34 



<210> SEQ ID NO 25 
<211> LENGTH: 34 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 25 

atatacacgt atatatatat atatacgtat atat 34 
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<211> LENGTH: 34 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 26 

caaacaaggt atatgcctgt atatacgaaa tggt 

<210> SEQ ID NO 27 

<211> LENGTH: 34 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 27 

atatatacgt atatatacat atatacgtat atat 

<210> SEQ ID NO 28 

<211> LENGTH: 34 

<212> TYPE : DNA 

<213> ORGANISM: Homo sapiens 

<4 00> SEQUENCE: 28 

atatatacgt atatatacat atatacacat atat 

<210> SEQ ID NO 29 

<211> LENGTH : 34 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 29 

ataaatatgt atatgtatat gtatacgtat ataa 

<210> SEQ ID NO 30 

<211> LENGTH: 34 

<212> TYPE: DNA 

<213> ORGANISM : Homo sapiens 

<400> SEQUENCE: 30 

atatatatgt atatgtatat gtatacgtat atat 

<210> SEQ ID NO 31 

<211> LENGTH: 34 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 31 

atatatacgt atatacacat atatacgtat atac 

<210> SEQ ID NO 32 
<211> LENGTH: 34 
<212> TYPE: DNA 
<213> ORGANISM: Mus sp. 

<400> SEQUENCE: 32 

atattgacat atattataaa gtataagtag ttat 

<210> SEQ ID NO 33 
<211> LENGTH: 34 
<212> TYPE: DNA 
<213> ORGANISM: Mus sp. 

<400> SEQUENCE: 33 

gtaactgagt atatgcatat atatacgtat atat 
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<211> LENGTH: 34 



<400> SEQUENCE: 34 

ataacatatt atatttatat atatatatat ttaa 



<210> SEQ ID NO 35 
<211> LENGTH : 34 
<212> TYPE : DNA 
<213> ORGANISM: Mus sp. 

<400> SEQUENCE: 35 

atatatatgt atatatatac atatacatac atat 3 4 



<210> SEQ ID NO 36 
<211> LENGTH: 34 
<212> TYPE : DNA 
<213> ORGANISM: Mus sp. 

<400> SEQUENCE: 36 

agcacttcct atataacttc atatacgtag ctcc 34 



<210> SEQ ID NO 37 
<211> LENGTH: 34 
<212> TYPE : DNA 

<213> ORGANISM: Caenorhabditis elegans 
<400> SEQUENCE: 37 

atagcgtcgt ataatccgaa atatacagat ctat 34 



<210> SEQ ID NO 38 
<211> LENGTH: 34 
<212> TYPE: DNA 

<213> ORGANISM: Arabidopsis thaliana 
<400> SEQUENCE: 3 8 

ctagtttggt atatatatat atataotaat ttat 34 



<210> SEQ ID NO 39 
<211> LENGTH: 34 
<212> TYPE : DNA 

<213> ORGANISM: Arabidopsis thaliana 
<400> SEQUENCE : 39 

ataactttgt atagtttaac ttatattagg tact 34 



<210> SEQ ID NO 40 
<211> LENGTH: 34 
<212> TYPE : DNA 

<213> ORGANISM: Flaveria anomala 
<400> SEQUENCE: 40 

atcagttagt atatattcgt atatacgtag atat 34 



<211> LENGTH: 34 

<212> TYPE: DNA 

<213> ORGANISM: Saccharomyces cerevisiae 

<400> SEQUENCE: 41 



34 
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What is claimed is: 

1. A method of site-specifically integrating a polynucle- 
otide sequence of interest in the genome of an isolated 
eucaryotic cell, said method comprising: 

introducing (i) a circular targeting construct, comprising a 
first recombination site and the polynucleotide 
sequence of interest, and (ii) an expression cassette 
comprising a polynucleotide encoding a site-specific 
recombinase into the isolated eucaryotic cell, wherein 
(a) the genome of said isolated eucaryotic cell com- 
prises a second recombination site native to the 
genome, (b) recombination between the first and sec- 
ond recombination sites occurs in the presence of the 
site-specific recombinase, and (c) the site-specific 
recombinase is selected from the group consisting of 
<(>C31 phage recombinase, TP901-1 phage 
recombinase, and R4 phage recombinase; and 

maintaining the isolated eucaryotic cell under conditions 
that allow recombination between said first and second 
recombination sites, wherein the recombination is 
mediated by the site-specific recombinase and the 
recombination results in site-specific integration of the 
polynucleotide sequence of interest in the genome of 
the isolated eucaryotic cell. 

2. The method of claim 1, wherein said first and second 
recombination sites are a bacterial genomic recombination 
site (attB) and a phage genomic recombination site (attP). 

3. The method of claim 2, wherein (i) said second 
recombination site comprises a pseudo-attP site, and (ii) said 
first recombination site comprises the attB site. 

4. The method of claim 3, wherein said site-specific 
recombinase is selected from the group consisting of <j)C31 
phage recombinase, TP901-1 phage recombinase, and R4 
phage recombinase. 

5. The method of claim 2, wherein (i) said second 
recombination site comprises a pseudo-attB site, and (ii) 
said first recombination site comprises the attP site. 

6. The method of claim 5, wherein said site-specific 
recombinase is (j)C31 phage recombinase. 

7. The method of claim 5, wherein said site-specific 
recombinase is R4 phage recombinase. 

8. The method of claim 5, wherein said site-specific 
recombinase is TP901-1 phage recombinase. 

9. The method of claim 2, wherein (i) attB comprises a 
first DNA sequence (attB5'), a bacterial core region, and a 
second DNA sequence (attB3') in the order attB5'-bacterial 
core region-attB3', (ii) attP comprises a first DNA sequence 
(attP5'), a phage core region, and a second DNA sequence 
(attP3') in the order attP5'-phage core region-attP3', and (iii) 
the recombinase mediates production of recombination- 
product sites that can no longer act as a substrate for the 
recombinase, said recombination-product sites comprising 
the order attB5'-(recombination-product site)-attP3' and 
attP5'-(recombination-product site)-attB3'. 

10. The method of claim 9, wherein (i) said second 
recombination site is a pseudo-attP site, and said second 
recombination site comprises a first DNA sequence (attT5'), 
a core region B, and a second DNA sequence (attT3') in the 
order attT5'-core region B-attT3', (ii) said first recombina- 
tion site is an attB site comprising attB5'-bacterial core 
region-attB3', in the order recited and (iii) the recombinase 
mediates production of recombination-product sites that can 
no longer act as a substrate for the recombinase, said 
recombination-product sites comprising the order attT5'- 
(recombination-product site)-attB3'{polynucleotide of 
interest}attB5'-(recombination-product site)-attT3'. 

11. The method of claim 9, wherein (i) said second 
recombination site is a pseudo-attB site, and said second 
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recombination site comprises a first DNA sequence (attT5'), 
a core region B, and a second DNA sequence (attT3') in the 
order attT5'-core region B-attT3', (ii) said first recombina- 
tion site is an attP site comprising attP5'-phage core region- 

5 attP3', in the order recited and (iii) the recombinase mediates 
production of recombination-product sites that can no longer 
act as a substrate for the recombinase, said recombination- 
product sites comprising the order attT5'-(recombination- 
product site)-attP3' {polynucleotide of interest}attP5 

10 '-(recombination-product site)attT3'. 

12. The method of claim 1, wherein said circular targeting 
construct further comprises a bacterial origin of replication. 

13. The method of claim 1, wherein said circular targeting 
construct further comprises a selectable marker. 

15 14. The method of claim 13, wherein said selectable 
marker provides for either positive or negative selection. 

15. The method of claim 1, wherein said polynucleotide 
sequence of interest comprises a promoter sequence. 

16. The method of claim 1, wherein said polynucleotide 
20 sequence of interest comprises at least one expression cas- 
sette. 

17. The method of claim 16, wherein said expression 
cassette of said polynucleotide sequence of interest com- 
prises a promoter operably linked to a polynucleotide 

2S sequence that encodes a product. 

18. The method of claim 17, wherein said product is an 
RNA molecule. 

19. The method of claim 17, wherein said product is a 
polypeptide. 

30 20. The method of claim 1, wherein the expression 
cassette comprising a polynucleotide encoding the site- 
specific recombinase is carried on a transient expression 
vector. 

21. The method of claim 1, wherein said expression 
35 cassette comprising a polynucleotide encoding the site- 
specific recombinase is introduced into the isolated eukary- 
otic cell before introducing the circular targeting construct. 

22. The method of claim 1, wherein said expression 
cassette comprising a polynucleotide encoding the site- 

40 specific recombinase is introduced into the isolated eukary- 
otic cell concurrently with introducing the circular targeting 

23. The method of claim 1, wherein said expression 
cassette comprising a polynucleotide encoding the site- 

45 specific recombinase is introduced into the isolated eukary- 
otic cell after introducing the circular targeting construct. 

24. A vector for site-specific integration of a polynucle- 
otide sequence into the genome of an isolated eucaryotic 
cell, said vector comprising, 

50 (i) a circular backbone vector, 

(ii) a polynucleotide of interest operably linked to a 
eucaryotic promoter, and 

(iii) a single recombination site, wherein said single 
recombination site comprises a polynucleotide 

55 sequence that recombines with a second recombination 
site in the genome of said isolated eukaryotic cell and 
said recombination occurs in the presence of a site- 
specific recombinase selected from the group consist- 
ing of <j>C31 phage recombinase, TP901-1 phage 

60 recombinase, and R4 phage recombinase. 

25. The vector of claim 24, wherein said circular back- 
bone vector is a procaryotic or eucaryotic vector. 

26. The vector of claim 24, wherein said polynucleotide 
of interest operably linked to a eucaryotic promoter further 

65 comprises additional control elements. 

27. The vector of claim 24, wherein the site-specific 
recombinase is <|)C31 phage recombinase. 
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28. The vector of claim 24, wherein said first and second 
recombination sites are a bacterial genomic recombination 
site (attB) and a phage genomic recombination site (attP). 

29. The vector of claim 28, wherein said first recombi- 
nation site is either attB or attP. 5 

30. The vector of claim 29, wherein said recombinase is 
the site-specific <j>C31 phage recombinase. 

31. The vector of claim 24, wherein said circular back- 
bone vector further comprises a bacterial origin of replica- 
tion. 10 

32. The vector of claim 24, wherein said circular back- 
bone vector further comprises a selectable marker. 

33. The vector of claim 32, wherein said selectable marker 
provides for either positive or negative selection. 

34. A kit for site-specific integration of a polynucleotide 15 
sequence into the genome of an isolated eucaryotic cell, said 
kit comprising, 

(i) a vector of claim 24, and 



recombination sites occurs in the presence of the site- 
specific recombinase and said site-specific recombinase 
is selected from the group consisting of <|>C31 phage 
recombinase, TP901-1 phage recombinase, and R4 
phage recombinase. 25 
35. A method of modifying a genome of an isolated 
eucaryotic cell, said method comprising the steps of 

(a) providing an isolated eucaryotic cell that does not 
comprise an attB or attP recombination site recognized 3Q 
by a site-specific recombinase selected from the group 
consisting of <j>C31 phage recombinase, TP901-1 phage 
recombinase, and R4 phage recombinase; and 

(b) inserting an attB or an attP recombination site into the 
genome of the isolated eucaryotic cell, wherein said 35 
recombination site is recognized by said site specific 
recombinase, thereby modifying the genome of the 
eukaryotic cell. 



36. The method of claim 35, wherein said inserting in step 
(b) is carried out by transforming the cell with a polynucle- 
otide containing the attB or attP recombination site under 
conditions such that the polynucleotide is inserted into the 
genome. 

37. The method of claim 35, further comprising 
introducing (i) a circular targeting construct, comprising 

an attP recombination site and a polynucleotide 
sequence of interest, and (ii) an expression cassette 
comprising a polynucleotide encoding the site-specific 
recombinase into the isolated eucaryotic cell, of step 
(b) and recombination recombinase, 
maintaining the isolated eukaryotic cell under conditions 
that allow recombination between said attP and attB 
recombination sites, wherein the recombination occurs 
in the presence of the site-specific recombinase and the 
result of the recombination is site-specific integration 
of the polynucleotide sequence of interest in the 
genome of the isolated eukaryotic cell. 

38. The method of claim 35, further comprising 
introducing (i) a circular targeting construct, comprising 

an attB recombination site and a polynucleotide 
sequence of interest, and (ii) an expression cassette 
comprising a polynucleotide encoding the site-specific 
recombinase into the isolated eucaryotic cell, of step 
(b), 

maintaining the isolated eukaryotic cell under conditions 
that allow recombination between said attB and attP 
recombination sites, wherein the recombination occurs 
in the presence of the site-specific recombinase and the 
result of the recombination is site-specific integration 
of the polynucleotide sequence of interest in the 
genome of the isolated eukaryotic cell. 



